Postgres Merge Joins

This is the last part of a three part series examining the Postgres join algorithms.

While a hash join is usually the fastest join algorithm, it is only so when it can be performed in memory. When Postgres thinks the hash table needed for a hash join will exceed memory, Postgres will typically use a merge join instead. A merge join winds up using disk much more effectively than a hash join can.

Similar to a hash join, a merge join only works on joins where the join condition is an equality filter. Let’s use the example we’ve been using with a people table (with fields id, name, and age) and a pets table (with fields id and owner_id), and the query:

FROM people, pets
WHERE = pets.owner_id
AND people.age > 30;

If Postgres decides to use a merge join, execution could proceed as follows. First find all rows for people over 30 and then sort those rows by Then fetch all rows from the pets table and sort them by owner_id. With the two input relations sorted, Postgres then performs a “merge”, similar to the merge step in merge sort.

The merge step works by comparing the first rows from each input (the person with the smallest id, and the pet with the smallest owner_id). If the id of the owner is equal to the owner_id of the pet, Postgres will add a row to the result of the join for that pair of rows and remove those two rows from the input1. If the fields are not equal and the person has an id smaller than the owner_id of the pet, Postgres will remove the person with the smallest id and continue the join with the person that has the next smallest id. The pet is removed instead if the owner_id of the pet is smaller than the id of the person. This process continues until one of the inputs runs out at which point Postgres returns all of the rows it generated as the result of the join.

Although a merge join is O(N*log(N) + M*log(M)), due to the need to sort the inputs, it can still be faster than a hash join even though it is asymptotically worse. If the sorting algorithm used is merge sort (it typically is), almost the entirety of the merge join is based on merging two inputs together. Postgres can perform a merge on disk very efficiently by keeping a small subset of the data in memory, performing a merge on that data, writing the result of that merge out to disk, and then reading more of the inputs into memory from disk. This reads and writes large amounts of data to/from disk at a time, which is relatively efficient when compared to an on-disk hash table which would be performing random I/O all over the place.

In addition to a merge join being faster when performed on disk, there is a case where a merge join is faster than a hash join, even when the hash join is performed in memory. If one or both of the inputs to the merge join are already sorted by the field being joined on, Postgres can skip the sort step and go straight to the merge step which, under some circumstances, makes the merge join faster than the hash join.

By the way, if you are working on scaling Postgres, I'm currently working on Perfalytics. Perfalytics is a service designed to help teams scale out Postgres by giving them insight into why their queries are slow and how they can go about making their queries faster. If you're interested in learning more about Perfalytics shoot me an email at

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

  1. Postgres actually has to generate one output row for every pair of rows which have equal fields and remove them all from the inputs.

Leave a Reply

Your email address will not be published. Required fields are marked *