Linear Regression
Linear Regression
March 29, 2025
Another tip from Haki.
Another common tool for analyzing data is linear regression. For example, performing linear regression using Pandas and Scipy:
>>> import pandas as pd
>>> import scipy.stats
>>> df = pd.DataFrame([[1.2, 1], [2, 1.8], [3.1, 2.9]])
>>> slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(df[0], df[1])
(1.0 -0.2000000000000004 1.0 9.003163161571059e-11 0.0)
Most developers probably don’t expect the database to have statistical functions, but PostgreSQL does:
WITH t AS (SELECT * FROM (VALUES
(1.2, 1.0),
(2.0, 1.8),
(3.1, 2.9)
) AS t(x, y))
SELECT
regr_slope(y, x) AS slope,
regr_intercept(y, x) AS intercept,
sqrt(regr_r2(y, x)) AS r
FROM
t;
slope │ intercept │ r
────────────────────┼──────────────────────┼───
1.0000000000000002 │ -0.20000000000000048 │ 1
Using statistical aggregate functions in PostgreSQL we got results similar to scipy.