This post is part 1 of a three part series examining transaction isolation in Postgres. This post covers the issues the come up under the default Postgres settings. Parts 2 and 3 will cover the different ways to eliminate the problems.
Isolation is one of the main tenants of ACID. A database’s transactions are considered to be fully isolated if two transactions performed at the same time do not interact with each other in any way. A formal definition of full isolation is that a database’s transactions are fully isolated if when multiple transactions are performed at the same time, there exists an ordering of the transactions such that executing the transactions concurrently behaves the same as executing the transactions one by one in that order.
By default, Postgres does not provide full transaction isolation (the same holds true for most SQL databases under the default settings). That means there are a variety of cases where two transactions can interact with each other. These cases are usually unexpected by Postgres users and commonly result in bugs in those users’s code. Let’s first take a look at the exact isolation semantics Postgres provides. Then we can take look at the different interactions and use the isolation semantics to explain why each interaction occurs.
For the isolation semantics, you will only need to understand how Postgres handles SELECT statements and how Postgres handles statements that modify data such as UPDATE and DELETE.
Whenever a SELECT statement is in Postgres, Postgres takes a snapshot of the current state of the database and runs the SELECT against that snapshot. The effects of all transactions that committed before the snapshot are visible to the SELECT statement. On the other hand, the effects of all transactions that either failed or committed after the snapshot was taken are completely invisible to the SELECT statement. It’s important to note that a new snapshot is created for each SELECT statement and not for each transaction. That means two SELECT statements in the same transaction will each have their own snapshot of the database.
UPDATE and DELETE
UPDATE and DELETE statements first create a snapshot of the database and use that snapshot to find all rows that match the WHERE clause associated with the statement. Once a row is found, the statement will try to lock it. If any of the rows found are already locked by a different UPDATE or DELETE statement in another transaction, the UPDATE or DELETE statement will wait for the other transaction to commit or abort.
If the other transaction aborts, the UPDATE or DELETE statement will act as if the other transaction had never existed. Otherwise, if the other transaction commits, the UPDATE or DELETE will reevaluate the WHERE clause on the new version of the row and update/delete the new version if it satisfies the predicate. If the new version does not satisfy the WHERE clause, it will be left unmodified. This behavior is used because it is a reasonable way to handle multiple concurrent modifications to the same row.
Now that we’ve got an idea of the semantics behind SELECT, UPDATE, and DELETE, let’s take a look at a few of the different cases where two transactions can interact with each other. In the examples below, I use S0, S1, … to refer to different sessions running concurrently. All of the examples make use of a table ints of bigints that starts out empty at the beginning of each example.
A non-repeatable read is when a transaction reads the exact same row twice, but finds the row to be different the second time it is read. As an example:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints; n --- 1 S2> BEGIN; S2> UPDATE ints SET n = 2; S2> COMMIT; S1> SELECT * FROM ints; n --- 2 S1> COMMIT;
In this case, even though both of the SELECT *‘s were ran in the same transaction, they returned different results. This is due to each SELECT statement in the transaction creating a new snapshot instead of their being a single snapshot per transaction. The first SELECT creates a snapshot where n=1 and returns that row. Then a second transaction runs an UPDATE command that changes n=2. Then the second SELECT creates a new snapshot where n=2 and returns that.
It is a somewhat common pattern (which should be avoided if possible) for a server to retrieve a value from the database, compute a new value based on the old value in the database, and write that new value back into the database. If two of these operations happen simultaneously, it can result in what’s known as a “lost update”1. As a simple example, let’s say two different servers try to increment the value in the ints table at the same time:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints; n --- 1 S2> BEGIN; S2> SELECT * FROM ints; n --- 1 S2> UPDATE ints SET n = 2; -- Computed server side. S2> COMMIT; S1> UPDATE ints SET n = 2; -- Computed server side. S1> COMMIT; S3> BEGIN; S3> SELECT * FROM ints; n --- 2 S3> COMMIT;
With the example above, session S1 had read the value 1. Then S2 increments the value in ints and sets it to 2. When S1 goes to set the value in ints, it doesn’t know that the value had been incremented by S2 after S1 had read it, so once S1 sets the value in ints to 2, the increment done by S2 was completely lost.
A phantom read occurs when the set of rows found by a statement change due to another transaction inserting new rows. A simple example is the following:
S0> BEGIN; S0> SELECT count(*) FROM ints; count ------- 0 S1> BEGIN; S1> INSERT INTO ints SELECT 1; S1> COMMIT; S0> SELECT count(*) FROM ints; count ------- 1 S0> COMMIT;
In Postgres, phantom reads have the same cause as non-repeatable reads, namely snapshots being created per query instead of per transaction. Historically phantom reads are considered different from non-repeatable reads because in some databases the causes between the two are completely different. Phantom reads are also a slightly more difficult problem to solve, including in Postgres.
(I couldn’t find the actual name for this kind of interaction, so I’m just calling it a skipped modification. If you know the actual name, I would appreciate you leaving a comment at the bottom of this page.)
A skipped modification can happen due to how Postgres handles concurrent modifications of a single row. It’s easiest if we start with an example and examine what exactly is going on. This example was taken from the Postgres docs:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> INSERT INTO ints SELECT 2; S0> SELECT * FROM ints; n --- 1 2 S0> COMMIT; S1> BEGIN; S1> UPDATE ints SET n = n+1; S2> BEGIN; S2> DELETE FROM ints WHERE n = 2; -- S2 blocks since the DELETE is trying to modify a row -- currently being updated. S1> COMMIT; -- S2 unblocks. S2> COMMIT; S3> BEGIN; S3> SELECT * FROM ints; n --- 2 3 S3> COMMIT;
When I first saw this, I couldn’t believe that this was possible! The session S0 inserts two rows, one with the value 1 and one with the value 2. Then session S1 runs an UPDATE that increments the value of all of the rows by 1. Then session S2 runs a DELETE and deletes all rows with the value 2. Intuitively, the DELETE statement should delete at least one row. It should either run before the UPDATE statement and delete the row that started with 2, or it should run after the UPDATE statement and delete the row that started with 1. Once both transactions commit and S3 displays the results, it appears the DELETE statement deleted neither row!
This is due to how concurrent UPDATE/DELETE‘s are handled. The DELETE first looks for rows with n=2. It finds the one row that started with n=2 and attempts to lock it. Since that row is currently being updated by another transaction and is already locked, the DELETE waits for transaction containing the UPDATE to finish. Once that transaction finishes, the row now has n=3. The DELETE statement now rechecks the WHERE clause against the row and sees that it’s now false, so it ignores the row and finishes. Because of the rules for concurrent UPDATE/DELETE‘s, a row is only touched if it satisfies the condition both before and after it’s updated!
A serialization anomaly occurs when the results of two transactions are correct independently of each other, but it’s impossible to execute the transactions in any order one by one and achieve the same results. As an example:
S0> BEGIN; S0> SELECT count(*) FROM ints; count ------- 0 (1 row) S1> BEGIN; S1> SELECT count(*) FROM ints; count ------- 0 (1 ROW) S1> INSERT INTO ints SELECT 1; S1> COMMIT; S0> INSERT INTO ints SELECT 1; S0> COMMIT;
There is no possible way to serialize S0 and S1. There are only two possible orderings of S0 and S1. Either S0 ran and then S1 did or S1 ran first and then S0 did. If S0 was ran first, the INSERT at the end of S0 came before the count(*) at the beginning of S1 which returned zero rows. If S1 ran first, the INSERT in S2 came before the count(*) in S0 which returned zero rows. Either way, a count(*) said there were zero rows after a row was inserted into the table.
When I first saw these examples, I was astounded at how unexpected some of the results were. In my next posts, we’ll take a look at some of the different tools for dealing with each of these different cases. That includes both the stronger isolation levels than the default provided by Postgres, as well as explicit row-level locking.