Data Alignment to Save Space
I saw Sehrope’s talk at PGconf New York 2021 recently. His slides are located here.
Very good talk that I plan on watching again in a year or so in time since I
believe there are some items that will click with me in the future when I
learn more about PostgreSQL.
The main thing that click with me today was arranging your columns by data
type.
I feel like I should have known this but if the data is stored in 8 byte
increments, PostgreSQL will give for example bigint takes up 8 bytes while
boolean takes only 2 bytes.
Sehrope describes that if you have bigint column, then boolean then
bigint then boolean, the total space would be 32 bytes.
Whereas if the two bigint columns are together and then boolean then the
total space would be only be 24 bytes.
If you add more boolean columns, it will still be 24 bytes until you fill the
8 bytes.
His example below:
CREATE TABLE bad_alignment (
a boolean,
b bigint,
c boolean,
d bigint,
e boolean,
f bigint,
g boolean,
h bigint
);
CREATE TABLE good_alignment (
b bigint,
d bigint,
f bigint,
h bigint,
a boolean,
c boolean,
e boolean,
g boolean
);
INSERT INTO bad_alignment
SELECT true, 1, true, 2, true, 3, true, 4
FROM generate_series(1, 1000000) x;
INSERT INTO good_alignment
SELECT 1, 2, 3, 4, true, true, true, true
FROM generate_series(1, 1000000) x;
table_name | table_size
----------------+------------
bad_alignment | 89 MB
good_alignment | 65 MBHis suggestion is to go from the largest columns to the smallest.
[12345678 12345678 12345678 12345678 12345678 12345678]
[B DDDDDDDD B DDDDDDDD B DDDDDDDD]
[DDDDDDDD DDDDDDDD DDDDDDDD BBB-----]
"B" = Boolean "DDDDDDDD" = Double "-" = Unused