Using Explain in Postgres

If you are ever doing any sort of query optimization in Postgres, knowledge of EXPLAIN is an absolute must. EXPLAIN let’s you see what algorithms Postgres is using under the hood in order to execute a query. Let’s look at an example usage of EXPLAIN:

Let’s say we have two tables. The first table, people, has columns id and age and the second table is a table called pets, which has fields id and owner_id, where the owner_id is the id of the owner of the pet. Additionally, let’s assume every field has an index on it. As for the query we want to perform, let’s say we want to find the name of the pet with the oldest owner. This can be done with the following SQL query:

SELECT pets.name
FROM people, pets
WHERE people.id = pets.owner_id
ORDER BY people.age
DESC LIMIT 1;

Now let’s see what the query plan (how Postgres plans to execute the query) is, which we can obtain by running EXPLAIN on the query:

> EXPLAIN SELECT pets.name
FROM people, pets
WHERE people.id = pets.owner_id
ORDER BY people.age
DESC LIMIT 1;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.86..6.20 rows=1 width=36)
   ->  Nested Loop  (cost=0.86..5343066.52 rows=1000000 width=36)
         ->  Index Scan Backward using people_age_idx on people  (cost=0.43..514360.60 rows=10000054 width=8)
         ->  Index Scan using pets_owner_id_idx on pets  (cost=0.42..0.46 rows=2 width=36)
               Index Cond: (owner_id = people.id)

The output of EXPLAIN contains both the query plan and some estimates Postgres makes about the execution of each part of the query. These estimates include an estimate of how many rows Postgres thinks each part of the query will generate and an estimate of how expensive each part of the plan will be1. Personally, I have rarely found information about the estimates to be useful and you are probably better off ignoring them2. If we remove the estimates from the output we get just the query plan3:

                           QUERY PLAN
----------------------------------------------------------------
 Limit
   ->  Nested Loop
         ->  Index Scan Backward using people_age_idx on people
         ->  Index Scan using pets_owner_id_idx on pets
               Index Cond: (owner_id = people.id)

You can think of the query plan as a tree. Each node returns rows to its parent. For the query plan above, there are two index scan nodes which both return rows to a nested loop node which itself returns rows to a limit node. The above query plan can be interpreted as follows:

First iterate over the people_age_idx (the index on the age field of the people table) backwards (in other words, starting from the oldest people going to the youngest). For each person, use the pets_owner_id_idx to look up the pets for that person (the rows of the pets table where pets.owner_id equals the people.id field of the person) and for each one, add a row to the result of the nested loop join. Keep iterating through the people until enough rows have been generated by the nested loop. In this case, since the query contains LIMIT 1, we only need a single row. It’s not hard to see that this will give us information about the pet with the oldest owner.

All query plans follow a similar format. It can take a while to learn what all of the different nodes do, but you can always lookup the node type in Google. Ultimately, being able to read query plans is an essential skill for optimizing Postgres queries. It’s much easier to optimize a query when you can understand how Postgres is running it.


By the way, if you are working on scaling Postgres, I'm currently working on Perfalytics. Perfalytics is a service designed to help teams scale out Postgres by giving them insight into why their queries are slow and how they can go about making their queries faster. If you're interested in learning more about Perfalytics shoot me an email at michaelmalis2@gmail.com.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

  1. One unit of cost is supposed to be equivalent to one sequential page read.
  2. Most of the time you can get exactly how many rows are generated by each part of the plan and exactly how long each part takes by using EXPLAIN ANALYZE, which we will look at next.
  3. This output can be obtained directly from Postgres by running EXPLAIN (COSTS OFF) instead of just EXPLAIN.

Leave a Reply

Your email address will not be published. Required fields are marked *