This post is part 1 of a three part series examining transaction isolation in Postgres. This post covers the issues the come up under the default Postgres settings. Parts 2 and 3 will cover the different ways to eliminate the problems.
Isolation is one of the main tenants of ACID. A database’s transactions are considered to be fully isolated if two transactions performed at the same time do not interact with each other in any way. A formal definition of full isolation is that a database’s transactions are fully isolated if when multiple transactions are performed at the same time, there exists an ordering of the transactions such that executing the transactions concurrently behaves the same as executing the transactions one by one in that order.
By default, Postgres does not provide full transaction isolation (the same holds true for most SQL databases under the default settings). That means there are a variety of cases where two transactions can interact with each other. These cases are usually unexpected by Postgres users and commonly result in bugs in those users’s code. Let’s first take a look at the exact isolation semantics Postgres provides. Then we can take look at the different interactions and use the isolation semantics to explain why each interaction occurs.
For the isolation semantics, you will only need to understand how Postgres handles SELECT statements and how Postgres handles statements that modify data such as UPDATE and DELETE.
Whenever a SELECT statement is in Postgres, Postgres takes a snapshot of the current state of the database and runs the SELECT against that snapshot. The effects of all transactions that committed before the snapshot are visible to the SELECT statement. On the other hand, the effects of all transactions that either failed or committed after the snapshot was taken are completely invisible to the SELECT statement. It’s important to note that a new snapshot is created for each SELECT statement and not for each transaction. That means two SELECT statements in the same transaction will each have their own snapshot of the database.
UPDATE and DELETE
UPDATE and DELETE statements first create a snapshot of the database and use that snapshot to find all rows that match the WHERE clause associated with the statement. Once a row is found, the statement will try to lock it. If any of the rows found are already locked by a different UPDATE or DELETE statement in another transaction, the UPDATE or DELETE statement will wait for the other transaction to commit or abort.
If the other transaction aborts, the UPDATE or DELETE statement will act as if the other transaction had never existed. Otherwise, if the other transaction commits, the UPDATE or DELETE will reevaluate the WHERE clause on the new version of the row and update/delete the new version if it satisfies the predicate. If the new version does not satisfy the WHERE clause, it will be left unmodified. This behavior is used because it is a reasonable way to handle multiple concurrent modifications to the same row.
Now that we’ve got an idea of the semantics behind SELECT, UPDATE, and DELETE, let’s take a look at a few of the different cases where two transactions can interact with each other. In the examples below, I use S0, S1, … to refer to different sessions running concurrently. All of the examples make use of a table ints of bigints that starts out empty at the beginning of each example.
A non-repeatable read is when a transaction reads the exact same row twice, but finds the row to be different the second time it is read. As an example:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints; n --- 1 S2> BEGIN; S2> UPDATE ints SET n = 2; S2> COMMIT; S1> SELECT * FROM ints; n --- 2 S1> COMMIT;
In this case, even though both of the SELECT *‘s were ran in the same transaction, they returned different results. This is due to each SELECT statement in the transaction creating a new snapshot instead of there being a single snapshot per transaction. The first SELECT creates a snapshot where n=1 and returns that row. Then a second transaction runs an UPDATE command that changes n=2. Then the second SELECT creates a new snapshot where n=2 and returns that.
It is a somewhat common pattern (which should be avoided if possible) for a server to retrieve a value from the database, compute a new value based on the old value in the database, and write that new value back into the database. If two of these operations happen simultaneously, it can result in what’s known as a “lost update”1. As a simple example, let’s say two different servers try to increment the value in the ints table at the same time:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints; n --- 1 S2> BEGIN; S2> SELECT * FROM ints; n --- 1 S2> UPDATE ints SET n = 2; -- Computed server side. S2> COMMIT; S1> UPDATE ints SET n = 2; -- Computed server side. S1> COMMIT; S3> BEGIN; S3> SELECT * FROM ints; n --- 2 S3> COMMIT;
With the example above, session S1 had read the value 1. Then S2 increments the value in ints and sets it to 2. When S1 goes to set the value in ints, it doesn’t know that the value had been incremented by S2 after S1 had read it, so once S1 sets the value in ints to 2, the increment done by S2 was completely lost.
A phantom read occurs when the set of rows found by a statement change due to another transaction inserting new rows. A simple example is the following:
S0> BEGIN; S0> SELECT count(*) FROM ints; count ------- 0 S1> BEGIN; S1> INSERT INTO ints SELECT 1; S1> COMMIT; S0> SELECT count(*) FROM ints; count ------- 1 S0> COMMIT;
In Postgres, phantom reads have the same cause as non-repeatable reads, namely snapshots being created per query instead of per transaction. Historically phantom reads are considered different from non-repeatable reads because in some databases the causes between the two are completely different. Phantom reads are also a slightly more difficult problem to solve, including in Postgres.
(I couldn’t find the actual name for this kind of interaction, so I’m just calling it a skipped modification. If you know the actual name, I would appreciate you leaving a comment at the bottom of this page.)
A skipped modification can happen due to how Postgres handles concurrent modifications of a single row. It’s easiest if we start with an example and examine what exactly is going on. This example was taken from the Postgres docs:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> INSERT INTO ints SELECT 2; S0> SELECT * FROM ints; n --- 1 2 S0> COMMIT; S1> BEGIN; S1> UPDATE ints SET n = n+1; S2> BEGIN; S2> DELETE FROM ints WHERE n = 2; -- S2 blocks since the DELETE is trying to modify a row -- currently being updated. S1> COMMIT; -- S2 unblocks. S2> COMMIT; S3> BEGIN; S3> SELECT * FROM ints; n --- 2 3 S3> COMMIT;
When I first saw this, I couldn’t believe that this was possible! The session S0 inserts two rows, one with the value 1 and one with the value 2. Then session S1 runs an UPDATE that increments the value of all of the rows by 1. Then session S2 runs a DELETE and deletes all rows with the value 2. Intuitively, the DELETE statement should delete at least one row. It should either run before the UPDATE statement and delete the row that started with 2, or it should run after the UPDATE statement and delete the row that started with 1. Once both transactions commit and S3 displays the results, it appears the DELETE statement deleted neither row!
This is due to how concurrent UPDATE/DELETE‘s are handled. The DELETE first looks for rows with n=2. It finds the one row that started with n=2 and attempts to lock it. Since that row is currently being updated by another transaction and is already locked, the DELETE waits for transaction containing the UPDATE to finish. Once that transaction finishes, the row now has n=3. The DELETE statement now rechecks the WHERE clause against the row and sees that it’s now false, so it ignores the row and finishes. Because of the rules for concurrent UPDATE/DELETE‘s, a row is only touched if it satisfies the condition both before and after it’s updated!
A serialization anomaly occurs when the results of two transactions are correct independently of each other, but it’s impossible to execute the transactions in any order one by one and achieve the same results. As an example:
S0> BEGIN; S0> SELECT count(*) FROM ints; count ------- 0 (1 row) S1> BEGIN; S1> SELECT count(*) FROM ints; count ------- 0 (1 ROW) S1> INSERT INTO ints SELECT 1; S1> COMMIT; S0> INSERT INTO ints SELECT 1; S0> COMMIT;
There is no possible way to serialize S0 and S1. There are only two possible orderings of S0 and S1. Either S0 ran and then S1 did or S1 ran first and then S0 did. If S0 was ran first, the INSERT at the end of S0 came before the count(*) at the beginning of S1 which returned zero rows. If S1 ran first, the INSERT in S2 came before the count(*) in S0 which returned zero rows. Either way, a count(*) said there were zero rows after a row was inserted into the table.
When I first saw these examples, I was astounded at how unexpected some of the results were. In my next posts, we’ll take a look at some of the different tools for dealing with each of these different cases. That includes both the stronger isolation levels than the default provided by Postgres, as well as explicit row-level locking.
12 thoughts on “Postgres Transactions Aren’t Fully Isolated”
I discovered this blog recently and it is amazing. I’ve been using PG for 15 years and have learned something with every post. Thank you!
This is a fascinating post! I knew some of these scenarios, but definitely not all of them!
For the skipped modification problem, I think this sounds a lot like the ABA problem: https://en.wikipedia.org/wiki/ABA_problem, except inverted: The delete is reading the same location twice and first it’s not actionable and then it’s the wrong value.
Thanks for the great writeup!
I think the session numbers don’t match between the code and the prose in the Serialization Anomalies section. The code uses S0 (zero) and S1 (one) and doesn’t have an S2 (two) and the prose uses S1 (one) and S2 (two), without ever mentioning S0 (zero).
This seems like over-statement. SQL supports several isolation levels and at the lower ones, Postgres is not fully isolated — but that is not anything specific to Postgres.
It’s not anything specific to Postgres, but I don’t expect your typical Postgres user to be familiar with this behavior at all.
But that is as per standards. I would have said “Postgres transactions aren’t fully isolated” if it did if it doesn’t do what it says (which is not the case)… or …. if it didn’t support an isolation level from the standard (which is not true either, since it does support all relevant levels).
The article title is akin to saying “MySQL isn’t transactional”…. but you should say that only if you’re using MyISAM, in which case this is expected.
Except most of your typical Postgres users are running Postgres under the default transaction isolation level. If MyISAM was still the default MySQL storage engine and was the one most people used, I would feel comfortable saying “MySQL isn’t transactional”.
MySQL was just an example here. The defaults in MySQL silently corrupt data ☺️ so let’s just not go to scary land. My point is that the title is misleading given that that is the norm in the industry. Because as far as I know no other DBMS has serialisation as it’s default which I hope you would agree is the best isolation level that you desire here.
I agree with you on this one.
Unfortunately, I have to agree that the title of this is blatant click-bait. Postgres users who are concerned with transaction isolation are certainly going to set the transaction isolation level correctly and I would suggest that most users actually do because they’re using existing frameworks which set it for them correctly. I’m sure that there are users who aren’t familiar with SQL or with transactional systems who don’t realize that they need to set the isolation level, of course, but then the thing to do is to educate them about the different transaction isolation levels, not to tell them why one particular transaction isolation level has issues, even if that happens to be the default.
TBH, this is a little misleading. Basically what is happening here is that the default isolation level is ‘read committed’, not ‘serializable’. When set to serializable, all if the isolation cases listed would either work properly or result in a serialization error.
This is all clearly documented here: