Understanding Postgres Recursive CTEs

Unlike ordinary CTEs, which can be thought of as syntactic sugar, recursive CTEs make SQL a more powerful language. As the name suggests a recursive CTE makes it possible to express recursion in SQL. The ability to express recursion makes recursive CTEs useful for querying any hierarchical data. Additionally, recursive CTEs even make SQL Turing complete!

In general, most queries that make use of a recursive CTEs look similar to the following template1:

WITH RECURSIVE <name> (<columns>) AS (
    <initial query>
  UNION ALL
    <recursive query>
)
<query>

As an example, here is a query that makes use of a recursive CTE to generate the integers from 1 to 100:

WITH RECURSIVE ints (n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM ints WHERE n+1<= 100
)
SELECT n FROM ints;

A recursive CTE executes according to the following steps:

  1. Evaluate the initial query and store the results in the “working table”. The working table is simply a temporary table that is used during the evaluation of the recursive CTE.
  2. Exit if the working table is empty.
  3. Add all of the rows in the working table to the result of the recursive CTE2.
  4. Evaluate the recursive query on the working table and replace the working table with the results of the recursive query. In the example above, that means the working table is substituted in for ints in the recursive query.
  5. Goto step 2.

Once the recursive CTE has been evaluated, the results of it can be used in the rest of the query like an ordinary CTE3.

Let’s walk through the example I gave above. First the recursive CTE evaluates the initial query which is just SELECT 1, and stores the results, a single row with the value 1, in the working table. Then since the working table is not empty, the recursive CTE proceeds to step 3 which adds the row with value 1 to the result of the recursive CTE. Next, the recursive query:

SELECT n+1 FROM ints WHERE n+1 <= 100

Is ran on the working table. That means the recursive query is ran with ints referring to the current working table. In this case, since the working table is only a single row with the value 1, the recursive query returns a single row with the value 2. This new row becomes the new working table. Steps 2-4 then repeat with the working table containing only a single row with the value 2. This process repeats with the values 3, 4, … until the working table has a single row with the value 100.

This time, the recursive query returns no rows, due to the filter condition in the recursive query. It’s at this point the recursive CTE terminates. During the execution of the CTE, each of the values 1 to 100 was added to the result set, so the recursive CTE terminates with the values 1-100 as the result. I also wrote a Python snippet equivalent to above recursive CTE which you can find here. Looking at it may help with understanding how to think about recursive CTEs.

Let’s now look at a more complicated example. An example that actually includes hierarchical data. Let’s say we have an employees table that models the management hierarchy at a business. The table has three columns: an id column, a name column, and a manager_id column. Let’s say we want to find everyone who works under Alice (someone who reports to someone who reports to someone … who reports to Alice), including Alice herself. That can be done with the following query:

WITH RECURSIVE temp (name, id) AS (
    SELECT name, id
    FROM employees 
    WHERE name = 'Alice'
  UNION ALL
    SELECT employees.name, employees.id 
    FROM temp, employees
    WHERE employees.manager_id = temp.id
)
SELECT name FROM temp;

Let’s look at the individual queries that make up the recursive CTE in the example. The initial query finds the row for Alice and adds it to the working table. The recursive query takes all people in the working table and uses a join to find all people who report to someone in the working table. Going through the steps above for the execution of a recursive CTE, what this query does is first find the row for Alice and puts it in the working table. Then the recursive query finds all people who report to Alice. Next the recursive query finds all people who report to someone who reports to Alice. This process repeats until the query has found all people who work under Alice.

There are all sorts of surprising computations you can perform with recursive CTEs. One of my favorite examples is a SQL query that generates the Mandelbrot Set. This is mainly possible recursive CTEs make SQL turing complete. With recursive CTEs, it is possible to perform any computation you want with SQL!

  1. You can also use UNION instead of UNION ALL, which has slightly different behavior. I didn’t mention it for the sake of simplicity.
  2. Alternatively, you can use UNION instead of UNION ALL in the recursive CTE. This will union the working table with the results table, removing all of the duplicates.
  3. A recursive CTE is actually evaluated lazily. This makes it possible to have a potentially infinite recursive CTE where only part of it is used.

Leave a Reply

Your email address will not be published. Required fields are marked *