Postgres CTEs

CTEs, also know as Common Table Expressions, are a way of refactoring a query. They let you give the result of a subquery a name and reuse the result multiple times. The general syntax of a CTE looks like the following:

WITH <name1> AS (<subquery1>),
     <name2> AS (<subquery2>),
     ...
     <nameN> AS (<subqueryN>)
  query

This will first evaluate each of the subqueries and assign the results of each query to the corresponding name. Then within the final query, you can use each name to refer to the results of the corresponding query.

As an example, I frequently do experiments where I try different combinations of indexes. To test which configuration is better, I’ll run many different queries against both configuration. To analyze the results I create a table runs which contains result of each run of each query. The table has the columns query_id which is the id of the query being ran, config which specifies which version of indexes were used, and duration which is the duration of that run of the query. To get the ten queries where which sped up the most going from one configuration to another, I’ll use a query similar to the following:

WITH a_runs AS (SELECT * FROM runs WHERE config = 'a'),
     b_runs AS (SELECT * FROM runs WHERE config = 'b')
  SELECT a_runs.query_id, a_runs.duration, b_runs.duration
  FROM a_runs, b_runs
  WHERE a_runs.query_id = b_runs.query_id
  ORDER BY a_runs.duration - b_runs.duration DESC
  LIMIT 10;

This query uses a CTE to define a_runs and b_runs as the runs of queries under configuration a, and the runs under configuration b respectively. From then on the query uses a_runs and b_runs to join the results of each configuration together.

There are two main differences between a CTE and just substituting the subqueries into the final query. First of all, a CTE can usually make a query easier to read. This alone is a pretty good reason for using CTEs. Second, using a CTE is currently equivalent equivalent to creating new tables with the results of each subquery, and then using the corresponding table every time you use the name corresponding to that query. This behavior has both some pros and some cons.

This can be advantageous, because it guarantees that each subquery is only evaluated once. This allows you to reuse the results of each subquery without reevaluating the subquery. The main downside of this behavior of CTEs is that it prevents Postgres from making use of certain query plans. There are some cases where Postgres can use information outside of the subquery to optimize the subquery and avoid doing some work. CTEs prevent Postgres from doing any of this1. If you want to read more about CTEs as an optimization fence, 2nd Quadrant has a blog post that goes into more detail.

  1. Some Postgres users take advantage of this behavior and use a CTE to force a specific desired plan. Since there are a decent number of Postgres users that depend on this behavior, the Postgres team is hesitant to change the behavior of CTEs.

Leave a Reply

Your email address will not be published. Required fields are marked *