In the last post, I discussed MVCC and how it allows Postgres to create snapshots of the current state of the database. One piece I didn’t mention is how Postgres handles failed transactions. The answer is through an object called the commit log (commonly abbreviated as “clog”). Use of the commit log also helps Postgres implement atomicity. In other words, through the commit log, Postgres is able to guarantee a transaction will either complete successfully or have no effect on the database at all.
As mentioned in the last post, MVCC is a technique Postgres uses to determine whether a row should be visible to a query. A requisite for MVCC is the capability to determine whether a transaction completed successfully before a given query started. MVCC as described in the last post only checks whether, for a given query, the transaction had started before the query and was not running when the query started. Based on that information alone, the query either could have either finished successfully or failed. By adding the capability to check whether a finished transaction had completed successfully or not, MVCC as previously described works. Postgres supports this capability through the commit log.
The commit log itself is fairly simple, it’s just a log of the txids (transaction ids) of all transactions that successfully completed. The last step a transaction must accomplish before successfully completing is writing its own txid to the commit log. Once a transaction has written it’s txid to the commit log, that transaction is considered to have successfully completed. If a transaction fails for any reason (manually rolled back, power outage, etc), it’s txid is never written to the commit log. If a transaction is no longer running, Postgres can determine whether the query succeeded or failed by looking for it’s txid in the commit log. If the txid of the transaction is in the commit log, the transaction succeeded, otherwise the transaction failed.
In addition to performing the checks for MVCC, by adding a check to the commit log, Postgres now handles failed transactions properly. The process for determining the visibility of a row with respect to a given query now looks like the following. First check that the transaction that inserted the row started before the query, was not running when the query started, and wrote its txid to the commit log. If all of these checks succeed, the row is visible unless all of the checks also hold true for the transaction that deleted the row. If the insert transaction started before the query, was not running when the query started, but did not insert its txid in the commit log, Postgres knows the transaction failed and the row should not be visible. If the insert transaction succeeded, but the deletion transaction failed, the row should still be visible.
One issue with this scheme is that it is inefficient. It’s hugely inefficient for every query to check the commit log multiple times for each row it wants to determine the visibility of. To get around this inefficiency, Postgres introduces the concepts of “hint bits”. Every row has some additional bits for keeping track of the success or failure of the transactions that inserted the row and deleted the row. Whenever Postgres looks up the status of a transaction that inserted or deleted a row in the commit log, it will then set a bit in the row corresponding to the result of the lookup. That way, any future query can just look at the hint bit and avoid a lookup to the commit log. There are a total of four bits, one for each pair of success/failure and insertion/deletion.
Personally, I find the commit log to be an extremely simple, but clever idea. It not only allows Postgres to handle failed transactions, but allows it to handle transactions that failed as a result of a crash! As we’ll see, there are many more clever ideas throughout the rest of Postgres.