The Postgres Vacuum

As mentioned in my post on MVCC, when a row is deleted from Postgres, the physical row is not actually deleted. All that really happens is the physical row is marked as deleted. That way, queries that started before the row was deleted can pretend the row was not deleted, which provides a simpler model for understanding how queries behave when multiple queries are executing concurrently (if you want to understand the reasoning for this behavior, you should read my posts on transactions and MVCC). One additional step necessarily is the physical removal of the row from disk. This is done by the Postgres vacuum.

The vacuum is a fairly simple idea. Once a certain number of rows are updated or deleted from a table1, the vacuum will automatically run on the table. The vacuum works by iterating through all of the rows of the table and freeing the space for rows that are no longer needed. The vacuum determines if a row can be freed by checking that there are no queries that started before the row was deleted. Once every query started before the row was deleted has finished, it is impossible for any query in the future to see the row, so Postgres is free to reclaim the space. Once the vacuum has freed the space for a row, Postgres is allowed to insert a new row into that space.

In order to reduce the amount of work needed to be performed by the vacuum, Postgres uses something called the “visibility map”, which also happens to be used for other purposes such as index-only scans. The visibility map stores an array of bits, one for each page in each table2. (For context, every row is stored on a page with multiple rows being stored on each page.) The bit for a given page is set if Postgres knows, for certain, that every row on the page is visible to every currently running query, and will be to every new query. The vacuum is able to skip any pages that have their bit set to one in the visibility map, since that implies there are no rows on the page that can be freed.

Currently, the only way bits in the visibility map are flipped on is through the vacuum if the vacuum determines every row on the page is visible. The bit for a page is flipped off if a new row is put on the page, or an existing row is deleted. In either of these cases, it is possible there are some transactions for which the row is not visible.

  1. The default is 50 rows + 20% of the number of rows in the table.
  2. This was changed in Postgres 9.6. The visibility map now has two bits per page, the other being used for wraparound vacuums.

Leave a Reply

Your email address will not be published. Required fields are marked *