track_io_timing

The parameter track_io_timing is a relatively unknown, but super helpful parameter when optimizing queries. As the name suggests, when the parameter is turned on, Postgres will track how long I/O takes. Then, when you run a query with EXPLAIN (ANALYZE, BUFFERS), Postgres will display how much time was spent just performing I/O.

You normally don’t want to have track_io_timing always on since it incurs a significant amount of overhead. To get around this, when you want to time how long a query is spending performing I/O, you can use a transaction with SET LOCAL track_io_timing = on;. This will enable track_io_timing only during the transaction. As a specific example of track_io_timing, here’s a simple query over a table I have laying around:

> BEGIN; SET track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pets; COMMIT;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Seq Scan on pets  (cost=0.00..607.08 rows=40008 width=330) (actual time=8.318..38.126 rows=40009 loops=1)
   Buffers: shared read=207
   I/O Timings: read=30.927
 Planning time: 161.577 ms
 Execution time: 42.104 ms

The I/O Timings field shows us that of the 42ms spent executing the query, ~31ms was spent performing I/O. Now if we perform the query again when the data is cached:

> BEGIN; SET track_io_timing = ON; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pets; COMMIT;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Seq Scan on pets  (cost=0.00..607.08 rows=40008 width=330) (actual time=0.004..7.504 rows=40009 loops=1)
   Buffers: shared hit=207
 Planning time: 0.367 ms
 Execution time: 11.478 ms

We can see the query is just about 31ms faster! This time the query does not show any information about the I/O timing since no time was spent performing I/O, due to the data being cached.

When benchmarking queries, I also make sure to make use of track_io_timing so I can see whether the expensive part of the query is performing I/O, or if the expensive part is something else entirely.

Leave a Reply

Your email address will not be published. Required fields are marked *