Interpolation
Interpolation
March 29, 2025
Yet another from [Haki](From https://hakibenita.com/sql-for-data-analysis).
Back and Forward Fill
Filling values with constants is easy, but not always possible. Another common interpolation technique is filling empty values with previous or following non-missing values.
Pandas offers several variations on back and forward filling, for example:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(['A', 'B', np.NaN, 'D', np.NaN, np.NaN, 'G'])
>>> df.fillna(method='ffill') # or df.ffill()
>>> 0
0 A
1 B
2 B
3 D
4 D
5 D
6 G
>>> df.fillna(method='bfill') # or df.bfill() or df.backfill()
0
0 A
1 B
2 D
3 D
4 G
5 G
6 G
To achieve the same using SQL, you can use a subquery:
WITH tb AS (
SELECT * FROM (VALUES
(1, 'A' ),
(2, 'B' ),
(3, null),
(4, 'D' ),
(5, null),
(6, null),
(7, 'G' )
) AS t(n, v)
)
SELECT
*,
-- Find the next not null value
coalesce(v, (
SELECT v
FROM tb AS tb_
WHERE tb_.n < tb.n AND v IS NOT NULL
ORDER BY n DESC
LIMIT 1
)) AS ffill_v,
-- Find the previous not null value
coalesce(v, (
SELECT v
FROM tb as tb_
WHERE tb_.n > tb.n AND v IS NOT NULL
ORDER BY n ASC
LIMIT 1
)) as bfill_v
FROM
tb;
n │ v │ ffill_v │ bfill_v
───┼───┼─────────┼─────────
1 │ A │ A │ A
2 │ B │ B │ B
3 │ ¤ │ B │ D
4 │ D │ D │ D
5 │ ¤ │ D │ G
6 │ ¤ │ D │ G
7 │ G │ G │ G
The SQL version is a bit longer, but it is fairly expressive, and it gives great flexibility.
NOTE: It’s tempting to use the window function LEAD and LAG here, but these function can only be used when filling single row gaps. Once you have more than one consecutive missing row, LEAD and LAG may leave you with missing values.