Using Explain Analyze in Postgres

In the last post, we discussed EXPLAIN and how it can be used to obtain the query plan for a query. EXPLAIN ANALYZE is a variation of EXPLAIN that provides additional information about the query. In addition to displaying all of the output EXPLAIN does, EXPLAIN ANALYZE will also run the query, track some metrics about the execution of each part of the query, and display that information. Let’s look at what happens when we run the query from the last post to get the pet with the oldest owner with EXPLAIN ANALYZE:

> EXPLAIN ANALYZE
SELECT pets.name
FROM people, pets
WHERE people.id = pets.owner_id
ORDER BY people.age
DESC LIMIT 1;
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.86..6.20 rows=1 width=36) (actual time=261.981..261.981 rows=1 loops=1)
   ->  Nested Loop  (cost=0.86..5343066.52 rows=1000000 width=36) (actual time=261.980..261.980 rows=1 loops=1)
         ->  Index Scan Backward using people_age_idx on people  (cost=0.43..514360.60 rows=10000054 width=8) (actual time=0.021..150.869 rows=100238 loops=1)
         ->  Index Scan using pets_owner_id_idx on pets  (cost=0.42..0.46 rows=2 width=36) (actual time=0.001..0.001 rows=0 loops=100238)
               Index Cond: (owner_id = people.id)
 Planning time: 0.296 ms
 Execution time: 262.009 ms
(7 rows)

(It may help to re-read the second to last paragraph of the last post to remind yourself how the query is executing.)

In the output, for each stage of the plan, there are two sets of numbers displayed, each surrounded by parenthesis. The first set of numbers (the ones starting with ‘cost=’) contains some estimates made by Postgres. As mentioned in the last post, these estimates are useless most of the time and you are better off ignoring them. The second set of numbers (the ones starting with ‘actual time=’) contain the actual run time statistics gathered by Postgres during query execution. Those are the numbers you will want to pay attention to. The format of those numbers is:

(actual time=STARTUP TIME..TOTAL TIME rows=ROW COUNT loops=LOOP COUNT)

Let’s start with an explanation of the loop count since the other numbers depend on it. The loop count, as the name sounds, is the number of times that node (by node I mean part of the plan) was executed. Most nodes will only be executed once, but there are cases where a node can be executed more than once. That mainly happens when a node is a child of a nested loop node. In the plan above, all of the nodes were executed only once except for the index scan on pets_owner_id_idx which was executed 100,238 different times. This is because Postgres uses the index scan to lookup the up the pets for each person as it iterates through the people from oldest to youngest.

The row count is fairly straightforward. It’s the average number of rows returned by the node each time it is executed. In other words, it’s the total numbers of rows returned by the node divided by the number of times the node was executed.

As for the total time, the name is somewhat misleading. The total time is the average amount of time spent executing the node on each execution of the node. For example, the index scan on pets_owner_id_idx has a total time 0.001ms. This means on average it took 0.001ms. Since the node was executed a 100,238 times, a total of ~100ms was spent executing that node.

Last is the startup time. The startup time is the amount of time it takes for the node to return a single row. Like the other numbers, this is averaged across every execution of the node. Personally I have never found the startup time useful and you’ll probably want to ignore it.

Together these numbers make it possible to determine where all of the time is spent during query execution. Looking at the plan above, it looks like most of the time is spent in two different parts of the query, the two index scans. A total of ~150ms is spent in the first index scan (150ms total time * 1 loop) and a total of ~100ms (.001ms total time * 100,238 loops) is spent in the second. This makes up ~250ms of the 262ms total execution time.

If you recall, the query executes by iterating over people in order from oldest to youngest, and checks if that person has a pet. If that person has a pet, the name of that pet is returned as the result of the query. Based on the output of EXPLAIN ANALYZE, it appears Postgres has to iterate through 100,238 people before it finds one person that has a pet. Since most of the time is spent fetching the rows for those people from the people table, and then looking up the pets for those people in the pets table, if we could avoid going through so many people, we may be able to dramatically speed up the query.

One way to do this would be to include the age of the owner of each pet in the pets table. If we then indexed that field we could quickly find the pet with the oldest owner and avoid going through all of 100,237 people that don’t have pets. In theory this would allow Postgres to execute the query much more quickly than it can currently. Let’s see what happens when we add an owner_age field with the age of the owner, create an index on it, and rewrite the query to take advantage of it:

> EXPLAIN ANALYZE
SELECT pets.name
FROM pets
ORDER BY owner_age DESC
LIMIT 1;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.49 rows=1 width=36) (actual time=0.021..0.021 rows=1 loops=1)
   ->  Index Scan Backward using pets_owner_age_idx on pets  (cost=0.42..61468.43 rows=1000000 width=36) (actual time=0.020..0.020 rows=1 loops=1)
 Planning time: 0.142 ms
 Execution time: 0.054 ms
(4 rows)

We can see the query now executes over 5000x faster. Now all Postgres needs to do is take the last row in the pets_owner_age_idx. Thanks to EXPLAIN ANALYZE, we were able to both determine where all of the time is being spent for the query and figure out how to optimize it.

Using Explain in Postgres

If you are ever doing any sort of query optimization in Postgres, knowledge of EXPLAIN is an absolute must. EXPLAIN let’s you see what algorithms Postgres is using under the hood in order to execute a query. Let’s look at an example usage of EXPLAIN:

Let’s say we have two tables, both of which have no indexes. The first table, people, has columns id and age and the second table is a table called pets, which has fields id and owner_id, where the owner_id is the id of the owner of the pet. Let’s say the pets table has a total of 1,000,000 rows. Additionally, let’s assume every field has an index on it. As for the query we want to perform, let’s say we want to find the name of the pet with the oldest owner. This can be done with the following SQL query:

SELECT pets.name
FROM people, pets
WHERE people.id = pets.owner_id
ORDER BY people.age
DESC LIMIT 1;

Now let’s see what the query plan (how Postgres plans to execute the query) is, which we can obtain by running EXPLAIN on the query:

> EXPLAIN SELECT pets.name
FROM people, pets
WHERE people.id = pets.owner_id
ORDER BY people.age
DESC LIMIT 1;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.86..6.20 rows=1 width=36)
   ->  Nested Loop  (cost=0.86..5343066.52 rows=1000000 width=36)
         ->  Index Scan Backward using people_age_idx on people  (cost=0.43..514360.60 rows=10000054 width=8)
         ->  Index Scan using pets_owner_id_idx on pets  (cost=0.42..0.46 rows=2 width=36)
               Index Cond: (owner_id = people.id)

The output of EXPLAIN contains both the query plan and some estimates Postgres makes about the execution of each part of the query. These estimates include an estimate of how many rows Postgres thinks each part of the query will generate and an estimate of how expensive each part of the plan will be1. Personally, I have rarely found information about the estimates to be useful and you are probably better off ignoring them2. If we remove the estimates from the output we get just the query plan3:

                           QUERY PLAN
----------------------------------------------------------------
 Limit
   ->  Nested Loop
         ->  Index Scan Backward using people_age_idx on people
         ->  Index Scan using pets_owner_id_idx on pets
               Index Cond: (owner_id = people.id)

You can think of the query plan as a tree. Each node returns rows to its parent. For the query plan above, there are two index scan nodes which both return rows to a nested loop node which itself returns rows to a limit node. The above query plan can be interpreted as follows:

First iterate over the people_age_idx (the index on the age field of the people table) backwards (in other words, starting from the oldest people going to the youngest). For each person, use the pets_owner_id_idx to look up the pets for that person (the rows of the pets table where pets.owner_id equals the people.id field of the person) and for each one, add a row to the result of the nested loop join. Keep iterating through the people until enough rows have been generated by the nested loop. In this case, since the query contains LIMIT 1, we only need a single row. It’s not hard to see that this will give us information about the pet with the oldest owner.

All query plans follow a similar format. It can take a while to learn what all of the different nodes do, but you can always lookup the node type in Google. Ultimately, being able to read query plans is an essential skill for optimizing Postgres queries. It’s much easier to optimize a query when you can understand how Postgres is running it.

Postgres Merge Joins

This is the last part of a three part series examining the Postgres join algorithms.

While a hash join is usually the fastest join algorithm, it is only so when it can be performed in memory. When Postgres thinks the hash table needed for a hash join will exceed memory, Postgres will typically use a merge join instead. A merge join winds up using disk much more effectively than a hash join can.

Similar to a hash join, a merge join only works on joins where the join condition is an equality filter. Let’s use the example we’ve been using with a people table (with fields id, name, and age) and a pets table (with fields id and owner_id), and the query:

SELECT *
FROM people, pets
WHERE people.id = pets.owner_id
AND people.age > 30;

If Postgres decides to use a merge join, execution could proceed as follows. First find all rows for people over 30 and then sort those rows by people.id. Then fetch all rows from the pets table and sort them by owner_id. With the two input relations sorted, Postgres then performs a “merge”, similar to the merge step in merge sort.

The merge step works by comparing the first rows from each input (the person with the smallest id, and the pet with the smallest owner_id). If the id of the owner is equal to the owner_id of the pet, Postgres will add a row to the result of the join for that pair of rows and remove those two rows from the input1. If the fields are not equal and the person has an id smaller than the owner_id of the pet, Postgres will remove the person with the smallest id and continue the join with the person that has the next smallest id. The pet is removed instead if the owner_id of the pet is smaller than the id of the person. This process continues until one of the inputs runs out at which point Postgres returns all of the rows it generated as the result of the join.

Although a merge join is O(N*log(N) + M*log(M)), due to the need to sort the inputs, it can still be faster than a hash join even though it is asymptotically worse. If the sorting algorithm used is merge sort (it typically is), almost the entirety of the merge join is based on merging two inputs together. Postgres can perform a merge on disk very efficiently by keeping a small subset of the data in memory, performing a merge on that data, writing the result of that merge out to disk, and then reading more of the inputs into memory from disk. This reads and writes large amounts of data to/from disk at a time, which is relatively efficient when compared to an on-disk hash table which would be performing random I/O all over the place.

In addition to a merge join being faster when performed on disk, there is a case where a merge join is faster than a hash join, even when the hash join is performed in memory. If one or both of the inputs to the merge join are already sorted by the field being joined on, Postgres can skip the sort step and go straight to the merge step which, under some circumstances, makes the merge join faster than the hash join.

Postgres Hash Joins

This is part two of a three part series examining the Postgres join algorithms.

Of the join algorithms Postgres has available, the hash join is usually the fastest. The main downside is hash joins only work where the join condition is an equality condition. Let’s say we have the tables people (with fields id, name, and age) and pets (with fields id and owner_id), and the query:

SELECT *
FROM people, pets
WHERE people.id = pets.owner_id
  AND people.age > 30;

If Postgres were to use a hash join to execute this query, one way execution would proceed is as follows. First fetch all rows from the people table with an age over 30. Then construct a hash table containing all of those rows keyed by people.id. Then iterate through all of the pets and use the check if the owner of the pet is in the hash table. If the owner is in the hash table, Postgres knows the owner of the pet is over 30 and will add a row to the output of the join for that (person, pet) pair.

A hash join is only O(M+N) in the size of the inputs to the join. Although a hash join is only linear in the size of the inputs, there are some cases where it won’t be as fast as the other join algorithms. If it turns out that M is really small in comparison to N, an index join will likely wind up faster since an index join is O(M*log(N)).

The other major case when a hash join won’t be the preferred join algorithm is when Postgres thinks the hash table needed for the hash join won’t fit in memory1. If the hash join runs out of memory and spills over to disk, it will become much slower. There just isn’t a good way to perform a hash join on disk. The issue is that the time it takes to access disk is so much greater than the time it takes to access memory that it usually winds up being better to use an asymptotically worse algorithm that is designed to be performed on disk. When Postgres thinks there isn’t enough memory to perform the hash join, Postgres will usually use the last join algorithm instead, the merge join.

Postgres Nested Loop Joins

This is the first part of a three part series examining the Postgres join algorithms.

For those who need a reminder, a join between two tables results in every pair of rows where some condition is true. Given a query of the form:

SELECT *
FROM table1, table2
WHERE <filter>;

Postgres will return every pair of rows from table1 and table2 where the join condition is true.  Similar to how a sequential scan is the most basic way to retrieve rows from the table, nested loops are the most basic way for Postgres to perform a join.  If Postgres were to execute the query with a nested loop, it could do so by iterating all of the entries in table1, iterating through all of the entries in table2, and then emitting a row whenever the pair of rows from table1 and table2 satisfy the filter condition. That’s literally all there is to a nested loop!

Now the truth is this way of executing a join can be extremely slow. A naive nested loop join is O(M*N) where M and N are the sizes of the inputs to the join. Although nested loops are relatively inefficient, they do have one major advantage. A nested loop is the only join algorithm Postgres has that can be used to process any join! No matter what the join condition is and no matter what indexes exist, Postgres always has the option of executing a nested loop (analogous to how Postgres always has the option of executing a sequential scan).

There is a slight variation of a nested loop, sometimes called an “index join”, that is much more efficient, but isn’t as general. Let’s say we have two tables, people (with fields id, and age) and pets (with fields id and owner_id), where the owner_id field in the pets table is the id of the owner of the pet. If we wanted to get all (people, pet) pairs where the owner of the pet is over 30 years old, we could use the following query:

SELECT *
FROM people, pets
WHERE people.id = pets.owner_id
  AND people.age > 30;

If there is an index on the owner_id field, Postgres can execute an index join. Postgres can first retrieve all rows from the people table for people over the age of 30 (using either a sequential scan, index scan, or bitmap index scan). Then Postgres can iterate through each person over 30 and use the index on owner_id to quickly lookup all pets that belong that person and then add that pair of rows to the result. This is only O(M*log(N)) and is usually much more efficient than the O(M*N) naive nested loop.

Postgres Bitmap Scans

This is the final part of a three part series exploring the different ways Postgres can retrieve rows from a table.

One major restriction with regular index scans is that they can only use a single index. Sometimes a query will have multiple filters on different columns which are indexed by different indexes. For example, if we may have a people table with age and weights fields both with indexes on them, and the following query:

SELECT *
FROM people
WHERE age > 30
  AND weight > 200;

Using an index scan, Postgres can either use an index to fetch all rows for people over 30 and post-filter for people with a weight over 200 or vice/versa. This means Postgres would have to fetch either all rows for people over 30 or all rows for people over 200 pounds from the table, even though the number of rows that satisfy both constraints may be relatively small. What we really want here is a way to locate all rows that satisfy both constraints and then fetch those rows from the table. That is exactly why a bitmap scan is useful.

A bitmap index scan is a way of combining multiple indexes. A bitmap index scan works by using the first index to locate all of the rows that satisfy the first filter, then using the second index to locate all indexes that satisfy the second filter, then intersecting the results to get the locations of all rows in the table that satisfy both filters. From there Postgres will fetch the rows from the physical table in the order the rows are in the physical table.

Fetching the rows from the table in the physical order they are in the table has several advantages. If it turns out a large portion of the table is returned, a bitmap scan becomes very similar to a sequential scan1. Under certain circumstances, this can even lead to a bitmap index scan being better than an orderinary index scan even when only a single index is used.

One downside to fetching rows in order of their physical location in the table is that Postgres loses any information about the sort order. This prevents Postgres from using a bitmap index scan in order to handle the ORDER BY part of a query even when the field being sorted by is indexed.

Postgres Index Scans

This is part two of a three part series examining the different ways Postgres can fetch rows from a table.

Usually a sequential scan winds up being pretty slow. A sequential scan needs to read the entire table in order to execute the query which can require a large amount of I/O. In many cases, Postgres can instead make use of an index scan in order to read only the portion of the table that is relevant to the query.

Most of the time when people talk about indexes in Postgres, they are referring to b-tree indexes. A b-tree is a kind of search tree that is optimized for databases. When a b-tree index is created, the user enters what column (or columns) the b-tree index should be “on”. For each row, the b-tree contains a leaf node with the column the index is created on and a pointer to the physical row in the table. The internal nodes don’t contain any data themselves and only partition out the child nodes. If we had a people table and created an index on age, the result may look like the following:

 

 

Since the nodes in the b-tree are sorted by the indexed column, Postgres can use the index to speed up the performance of many different queries. There are two main types of types of queries that can be sped up by an index scan.

The first type are queries that contain a filter that is either a range filter or an equality filter on the indexed column. These queries can use an index scan to quickly locate all rows that match the filter. For a range query, Postgres can scan over the relevant portion of the b-tree, looking for all nodes where the indexed column falls into the range specified by the query. Once a node is found, Postgres will fetch the row from the table and add it to the result of the query. As an example the query:

SELECT *
FROM people
WHERE 30 <= age AND age <= 40;

can scan over all of the index entries where age is between 30 and 40 and fetch only the rows for those entries. A similar search happens for queries with an equality filter.

The other main type of queries sped up are those that sort by the indexed field. Such a query may look like:

SELECT *
FROM people
ORDER BY age DESC
LIMIT 10;

Since an index the is sorted by the indexed field, Postgres can just fetch the rows in the order they are in the index. Although, depending on the exact circumstances, it can be faster for Postgres to perform a sequential scan and sort the rows afterwards. Usually if only a small portion of the table is returned by the query, an index scan will be faster than a sequential scan.

Although indexes are useful becauase they allow Postgres to perform index scans, there are two main reasons why you wouldn’t want to create the index. First of all, they take up additional space. Since an index contains a complete copy of the column it was created on, if you were to just index every single column on a table, you would find the table was taking up twice as much space as it was before.

Additionally, indexes increase the amount of I/O that has to be performed in order to insert a row into the table. In order to insert a row, Postgres has to write the row into the table, and also insert an entry into each of the table’s indexes.

Postgres Sequential Scans

This is part one of a three part series on the main ways Postgres has of fetching rows from a table. 

Of the three main ways Postgres has fetching rows from a table, a sequential scan is the most basic. To execute a sequential scan, Postgres literally iterates through a table a row at a time and returns the rows requested in the query. As an example, let’s say we have a table people that looks like the following:

IdNameAge
1Alice32
2Bob40
.........

And the query:

SELECT *
FROM people
WHERE age > 35;

If Postgres were to execute the query with a sequential scan, Postgres will iterate through each person in the people table, check if that person is older than 35, and then include that person in the result if they pass the filter.

There are two main reasons why Postgres will execute sequential scans. The first is that a sequential scan is always possible. No matter what the schema of the table is, or what indexes exist on the table, Postgres always has the option of executing a sequential scan.

The other main reason is that in some cases a sequential scan is actually faster than the other options available. When reading data from disk, reading data sequentially is usually faster than reading the data in a random order. If a large portion of the table will be returned by a query, a sequential scan usually winds up being the best way to execute it. This is because a sequential scan performs sequential I/O whereas the other options available mostly perform random I/O.