MERGE Command with Examples
Not sure where I found this. With PostgreSQL 15 comes a new SQL Command called MERGE. MERGE has been in the SQL standard for quite a while, but it just made its way into the PostgreSQL codebase. Prior to MERGE, if you wanted to refresh a target table from a source table, prior to Postgres 15, you could use the “upsert” method with the ON CONFLICT clause.
Now, MERGE can be used instead! Some situations where MERGE makes a lot of sense are:
- data loading from external sources, thru foreign data wrappers
- staged and batched process jobs
About MERGE
We can read that the source of the data_source can be either a table or some data brought to the command thanks to the prior WITH with_query, or inside the MERGE after USING source_query. When the data matches, i.e. there’s a matching record in the target_table_name, we can specify what to do:
do an UPDATE as defined in the merge_update (see UPDATE SET…) part or do a DELETE as defined in the merge_delete or do nothing When the data doesn’t matches, i.e. there’s no matching record in the target_table_name, then we do an INSERT (see merge_update).
MERGE Example - Remote Sensors & Batches
In working with clients, I have seen a need for data loading with remote sensors or stations. Anything “sending regularly data” and loading that data into a dataset is a common need. I’m particularly excited about using MERGE to solve these issues for Postgres users.
In order to explain this data loading use case of MERGE, I’m going to set up an example. We’ll have a database of stations that’s a remote measurement tool with data coming in intermittently.
- Streaming data, data arriving continuously
- Stations emit periodic and intermittent data measures
- There’s a batch collecting this data
- station_data_new is a temp table for common storage
- Data is stored long term station_data_actual with the last possible values
- We want to keep track of when the station has been created
Let’s create some tables for testing purpose. Beware that the temporary table station_data_new will only exist in the context of a given session.
create temporary table station_data_new (
station_id integer
, a integer
, b integer
);
create table station_data_actual (
station_id integer primary key
, a integer
, b integer
, created timestamp default current_timestamp
, updated timestamp default current_timestamp
);
Let’s create some sample data into station_data_new: our 1st 5 stations are up and sent the 1st batch of data:
with measures as (
select *
from generate_series(1,5)
)
insert into station_data_new (
station_id
, a
, b
)
select
generate_series
, round(random()*100+1)
, round(random()*100+1)
from
measures;
Basic MERGE
At this point, we could just do a plain INSERT with SELECT from station_data_new to station_data_actual. Instead we’ll use MERGE because we’re planning for cases when the data already exists, an UPDATE will be issued and not an INSERT.
merge into station_data_actual sda
using station_data_new sdn
on sda.station_id = sdn.station_id
when matched then
update set a = sdn.a, b = sdn.b, updated = default
when not matched then
insert (station_id, a, b)
values (sdn.station_id, sdn.a, sdn.b);
If you execute it once, you’ll have in return MERGE 5, and the data is inserted into station_data_actual, where the timestamps in created and updated have the same value.
To do even more testing you could:
- truncate table station_data_new;
- Re-create data in it, by changing generate_series(1,5) with generate_series(1,10)
- Issue again the very same MERGE as we ran before
You’ll see the data in station_data_actual updated. Stations 1 to 5 will have updated status and stations 6 to 10 will be created and updated.
Conclusion
MERGE was a long awaited feature for PostgreSQL fans! Now we have it. I encourage you to review your processes when it’s about merging existing data in your database. In this example, new data entering with existing data can be a good place to use this.
MERGE opens new usages of aggregating and/or merging data from many databases. In distributed models, where all data exists across different locations, using foreign data wrappers with MERGE could be a really elegant solution. MERGE may simplify your processing of data all along the database’s life. With strong and fast SQL statements as opposed to functions or other complex operations.