Postgres Lateral Joins

Personally, lateral joins are one of my favorite Postgres features. They are simple, while at the same time they let you write queries that would be nearly impossible to write otherwise. I find it surprising lateral joins were only introduced into Postgres four years ago given how useful they are. I also find it surprising there are no good explanations of lateral joins online given how simple they are.

A lateral join is simply a foreach loop in SQL. There are actually two different ways of writing a lateral join. The simplest form, which is the one you probably want looks like the following:

SELECT <columns>
FROM <table reference>,
     LATERAL <inner subquery>;

(For context, a table reference is either a table or a subquery.)

The above query will iterate through each row in the table reference and evaluate the inner subquery for each row, exactly like a foreach loop would. The rows returned by the inner subquery are then added to the result of the lateral join. The primary reason for using a lateral join is that the inner subquery can refer to the fields of the row from the table reference and use that to determine what rows to return. To help illustrate what this means, here is Python code that imitates a lateral join:

result = []
for row1 in table_reference():
    for row2 in inner_subquery(row1):
        result += (row1, row2)

As you should be able to see, the rows iterated over in the inner for loop are a function of the current row being iterated over in the outer for loop.

As an example of where a lateral join is useful, let’s say we have a table people with an id column and an age column, and a table pets with columns id, owner_id and age. For the query, let’s say for each person over 30, we want to obtain the id of the oldest pet owned by that person. Anytime you hear the words “for each”, you are probably looking for a lateral join. This specific example can be done with the following query:

SELECT people_sub.id, pets_sub.id
FROM (SELECT id FROM people WHERE age > 30) people_sub,
     LATERAL (SELECT pets.id
              FROM pets
              WHERE pets.owner_id = people_sub.id
              ORDER BY pets.age DESC
              LIMIT 1) pets_sub;

For each person over 30, this query will evaluate the inner subquery and add the results of the subquery to the results of the full query. The inner subquery, given a row for a person, will find the id of the oldest pet owned by that person, if they own a pet. That’s all there is to the lateral join. Believe it or not, most of the other ways of writing the above query are much trickier and usually are not as efficient1.

The other syntax for writing a lateral join is the following:

SELECT <columns>
FROM <table reference>
     JOIN LATERAL <outer subquery>
ON TRUE;

The advantage of this form is you can use any join qualifier (e.g. LEFT JOIN LATERAL). If in the example above, we also wanted to obtain people who didn’t have a pet at all, we would have needed to use a lateral left join to get those people. Most of the time though, the first lateral join form is good enough.

To recap, a lateral join is basically a foreach loop in SQL. Anytime you want to perform some calculation for each row in a subquery, you should try to see if a lateral join suits your use case. Once you understand a lateral join, you’ll realize it makes all sorts of queries much easier to write.

  1. The only real other option I’m aware of is a correlated subquery. The advantage of a lateral join is that it can return any number of rows whereas a correlated subquery has to return exactly one.

5 thoughts on “Postgres Lateral Joins

  1. Couldn’t your pet example easily be done with a “select distinct on”?

    SELECT DISTINCT ON (people.id) people.id, pets.id FROM people JOIN pets ON people.id = pets.owner_id WHERE people.age > 30 ORDER BY people.id, pets.age DESC

    Use LEFT OUTER JOIN if you want to include people without pets.

    And actually testing your query, it doesn’t retrieve the id of the oldest pet of a person (as stated in the text) but retrieves all pet ids ordered by their age in descending order. If you wanted that it’s just a simple JOIN + ORDER:

    SELECT people.id, pets.id FROM people JOIN pets ON people.id = pets.owner_id ORDER BY people.id, pets.age DESC

    1. The query was missing a LIMIT 1. I’ve fixed the query now. As for the query you gave, the efficiency of the query is very different from the lateral join. If your query executes with a nested loop join, it will first fetch all of the pets for each person over 30, then distinct on user. The lateral join on the other hand only retrieves the oldest pet for each person and does not read the rows for all of the other pets. Depending on the exact distribution of data (specifically how many pets each person has) one of these will be faster than the other.

      1. Ok, I can see that, with my query, I potentially get multiple entries for a ‘people’ entry (in case they have more than one pet) which are then narrowed down to one by the ‘distinct on.’

        But is that really so much different from looping over all people, fetching all their pets, ordering these by descending age, and then picking the top one? It’s not like a single ‘people’ row is fetched more than once in my version, is it?

        For lack of a large set of data to test performance with I didn’t ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *