Postgres Window Functions

Postgres window functions are an extremely useful feature. They let you calculate a value for each row based on the values of other rows. For our examples, let’s say we have a table ints full of random numbers:

CREATE TABLE ints (n bigint);

INSERT INTO ints 
SELECT floor(random() * 1000000) 
FROM generate_series(1,1000);

The most basic window functions behave similar to regular aggregations. The difference is that for window functions, the result of the aggregation. Since the result is included in each row you can also select regular columns out of each row. To change a regular aggregation function into a window function, you just add “OVER ()” after the aggregation function. For example, here is a query that includes the sum across all rows in each row, along with the value in each row:

> SELECT n, sum(n) OVER () FROM ints;

   n    |    sum    
--------+-----------
 481023 | 498397678
 772520 | 498397678
 709081 | 498397678
 292436 | 498397678
...

This allows you to easily express certain calculations. For example, you can calculate what fraction of the total sum each row is:

> SELECT n, n / (sum(n) OVER ()) AS fraction
 FROM ints;

   n    |          fraction          
--------+----------------------------
 481023 |     0.00096513892667052113
 772520 |     0.00155000722134182977
 709081 |     0.00142272131532683425
 292436 |     0.00058675233234132363
...

Window functions also have an equivalent to GROUP BY. By adding “PARTITION BY <expression>” inside of the parens after the OVER, you can calculate the window function over different subsets of the data. As an example, here is the above query, but it instead tells you how much of a fraction each value is of the total sum of all numbers with the same digit:

> SELECT n, n / (sum(n) OVER (PARTITION BY n % 10)) AS fraction 
FROM ints;

   n    |          fraction
--------+----------------------------
 457940 |     0.00951268915957674204
 595290 |     0.01236583117832999688
 111670 |     0.00231969690013961389
 830300 |     0.01724764337947453579
...

From this query, we can tell that 595290 is ~1.2% of the total sum of all numbers in the table that end in 0.

So far we have seen examples that aren’t too difficult to replicate with ordinary SQL. Window functions become harder to replicate with ordinary SQL when you introduce ORDER BY into the OVER clause. First of all, introducing ORDER BY completely changes the behavior of all regular aggregates. Now, instead of including the total aggregation in each row, a rolling aggregation included. For example, using ORDER BY with sum gives you a rolling sum:

> SELECT n, SUM(n) OVER (ORDER BY n ASC) FROM ints;

   n    |    sum    
--------+-----------
    689 |       689
   1197 |      1886
   1201 |      3087
   3405 |      6492
...

You can also combine ORDER BY and PARTITION BY to get a rolling sum for the sum of numbers with a given last digit:

> SELECT n, SUM(n) OVER (PARTITION BY n % 10 ORDER BY n ASC) FROM ints;

   n    |   sum    
--------+----------
  16900 |    16900
  23230 |    40130
  26540 |    66670
  42310 |   108980
...
   1201 |     1201
  18371 |    19572
  19221 |    38793
  36371 |    75164
...

I haven’t mentioned it yet, but there are actually many different functions that are designed specifically for working as window functions. These functions aren’t available as regular aggregation functions. These functions will return values based on the row’s value relative to the other rows. Two of these functions that deserve special attention are row_number and lag. The function row_number tells you each row’s ranking among the other rows based on the ordering used. For example, for the first row according the ordering, row_number will return 1. For the second it will return 2 and so on:

> SELECT n, row_number() OVER (ORDER BY n ASC) FROM ints;

   n    | row_number 
--------+------------
    689 |          1
   1197 |          2
   1201 |          3
   3405 |          4
...

The function lag on the other hand evaluates to the value of an expression on nth previous value according to the ordering. A simple calculation you can perform with lag is calculating the difference between each row and the previous row:

> SELECT n, n - lag(n, 1) OVER (ORDER BY n ASC) AS diff FROM ints;

   n    | diff 
--------+------
    689 |     
   1197 |  508
   1201 |    4
   3405 | 2204
...

Overall, window functions are a pretty awesome features. They dramatically increase the querying power of SQL and make it so much easier to express many different queries. Ultimately, window functions are another tool in the toolbox that Postgres provides that makes Postgres so great.

Leave a Reply

Your email address will not be published. Required fields are marked *