Creating Histograms with Postgres

Creating Histograms with Postgres

April 4, 2025

Elizabeth’s article does a good job showing how to do histograms.

SELECT value, width_bucket(value, 0, 100, 10) AS bucket 
  FROM generate_series(0, 100) AS value;

Auto-formatting histogram with SQL

Let’s build out a larger query that creates ranges, range labels, and formats the histogram. We will start by using a synthetic table within a CTE called formatted_data. We are doing it this way so that we can replace that query with new data in the future.

WITH formatted_data AS (
  SELECT * FROM (VALUES (13), (42), (18), (62), (93), (47), (51), (41), (1)) AS t (value)
), bucket_settings AS (
  SELECT
        5 AS bucket_count,
        0::integer AS min_value, -- can be null or an integer
        100::integer AS max_value -- can be null or an integer
), calculated_bucket_settings AS (
	SELECT
	  (SELECT bucket_count FROM bucket_settings) AS bucket_count,
	  COALESCE(
	          (SELECT min_value FROM bucket_settings),
	          (SELECT min(value) FROM formatted_data)
	  ) AS min_value,
	  COALESCE(
	          (SELECT max_value FROM bucket_settings),
	          (SELECT max(value) + 1 FROM formatted_data)
	  ) AS max_value
), histogram AS (
  SELECT
    WIDTH_BUCKET(value, calculated_bucket_settings.min_value, calculated_bucket_settings.max_value + 1, (SELECT bucket_count FROM bucket_settings)) AS bucket,
    COUNT(value) AS frequency
  FROM formatted_data, calculated_bucket_settings
  GROUP BY 1
  ORDER BY 1
 ), all_buckets AS (
  SELECT
    fill_buckets.bucket AS bucket,
    FLOOR(calculated_bucket_settings.min_value + (fill_buckets.bucket - 1) * (calculated_bucket_settings.max_value - calculated_bucket_settings.min_value) / (SELECT bucket_count FROM bucket_settings)) AS min_value,
    FLOOR(calculated_bucket_settings.min_value + fill_buckets.bucket * (calculated_bucket_settings.max_value - calculated_bucket_settings.min_value) / (SELECT bucket_count FROM bucket_settings)) AS max_value
  FROM calculated_bucket_settings,
	  generate_series(1, calculated_bucket_settings.bucket_count) AS fill_buckets (bucket))

 SELECT
   all_buckets.bucket AS bucket,
   CASE
   WHEN all_buckets IS NULL THEN
	   'out of bounds'
	 ELSE
     CONCAT(all_buckets.min_value, ' - ', all_buckets.max_value - 1)
   END AS range,
   SUM(COALESCE(histogram.frequency, 0)) AS frequency
 FROM all_buckets
 FULL OUTER JOIN histogram ON all_buckets.bucket = histogram.bucket
 GROUP BY 1, 2
 ORDER BY bucket;

Try modifying the values in the bucket_settings CTE to see how the histogram responds. By increasing the bucket_count, min_value, or max_value, you’ll see the histogram respond appropriately. If you modify the range to exclude values, using the FULL OUTER JOIN, you’ll see that all non-classified items are bucketed as “out of bounds”.