This post is part 2 in a three part series exploring transaction isolation in Postgres. Part 1 introduced some of the problems that come up under the default isolation settings. This part and the next one introduce common ways of eliminating those problems.
As mentioned in part 1, there are many different cases where concurrent transactions can interact with each other. One of the most basic ways to avoid a few of the interactions is to use row level locks. Row level locks are a way of preventing concurrent modification to a row. Let’s take a look at how row-level locks change the behavior of each of the examples in the part 1:
Non-Repeatable Reads
To recap, a non-repeatable read is when a single transaction rereads a single row twice and finds it to be different the second time. Here’s the example from part 1 demonstrating non-repeatable reads:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints; n --- 1 S2> BEGIN; S2> UPDATE ints SET n = 2; S2> COMMIT; S1> SELECT * FROM ints; n --- 2 S1> COMMIT;
If S1 were to acquire a row level lock on the row as soon as it first read the row, S2 would unable from updating the rows as long as S1 holds the row level lock. To acquire row level locks with a SELECT statement, you just add FOR SHARE to the end of the SELECT. The lock will be released once the transaction commits. Rewriting the example with FOR SHARE, it now looks like:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints FOR SHARE; n --- 1 S2> BEGIN; S2> UPDATE ints SET n = n+1; -- Blocks because the row being updated is locked by -- another transaction. S1> SELECT * FROM ints; n --- 1 S1> COMMIT; -- The UPDATE in S2 completes since S1 released its lock -- on the row. S2> COMMIT;
What FOR SHARE does is acquire a “read lock” on each of the rows returned by the SELECT statement. Once the read locks are acquired, no other transaction will be able to update or delete the rows until the transaction holding the read locks commits and releases the locks. Multiple transactions can possess read locks on the same row, and it is impossible for any transaction to update a row as long as at least one other transaction has a read lock on the row.
Lost Updates
Since multiple transactions can acquire read locks on a single row at the same time, FOR SHARE doesn’t exactly handle lost updates well. Here’s the example of a lost update from part 1 rewritten to use FOR SHARE:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints FOR SHARE; n --- 1 S2> BEGIN; S2> SELECT * FROM ints FOR SHARE; n --- 1 S2> UPDATE ints SET n = 2; -- Computed server side. -- Blocks because S1 has a read lock on the row. S1> UPDATE ints SET n = 2; -- Computed server side. ERROR: deadlock detected S1> ROLLBACK; S3> BEGIN; S3> SELECT * FROM ints; n --- 2 S3> COMMIT;
What happened here is the read lock acquired by S1 is preventing the UPDATE in S2 from running and the read lock acquired by S2 is preventing the UPDATE in S1 from running. Postgres detects that this is a deadlock and aborts the transaction in S1.
To get around this issue, you’ll want to use a variation of FOR SHARE called FOR UPDATE. FOR UPDATE also acquires a locks on the rows being selected, but instead of acquiring read locks on each row, it acquires write locks on each row. A write lock is similar to a read lock, but as long as a transaction has a write lock on the row, no other transaction can have a read or write lock on the same row. In fact, a write lock is what UPDATE and DELETE grab before they modify a row. Let’s take a look at what happens when we instead use FOR UPDATE instead of FOR SHARE:
S0> BEGIN; S0> INSERT INTO ints SELECT 1; S0> COMMIT; S1> BEGIN; S1> SELECT * FROM ints FOR UPDATE; n --- 1 S2> SELECT * FROM ints FOR UPDATE; -- Blocks because S1 has a write lock. S1> UPDATE ints SET n = 2; -- Computed server side. S1> COMMIT; -- S1 releases the write lock and S2 unblocks. -- The SELECT in S2 returns: n --- 2 S2> UPDATE ints SET n = 3; -- Computed server side. S2> COMMIT; S3> BEGIN; S3> SELECT * FROM ints; n --- 3 S3> COMMIT;
By using FOR UPDATE, S1 signals it’s about to modify the rows it is selecting. S2 also wants to modify the row, so when it sees S1 is about to modify the row, it waits for S1 to complete before reading the row. This guarantees that S2 sees an up to date value. In effect, by using FOR UPDATE, S1 makes the read and write performed on the row happen in a single step.
Phantom Reads
Although row level locks are able to prevent non-repeatable reads, they are unable to prevent phantom reads, even though they seem to be similar issues. If we add FOR SHARE to the example of a phantom read in the part 1:
S0> BEGIN; S0> SELECT count(*) S0> FROM (SELECT * FROM ints FOR SHARE) sub; count ------- 0 S1> BEGIN; S1> INSERT INTO ints SELECT 1; S1> COMMIT; S0> SELECT count(*) FROM ints; count ------- 1 S0> COMMIT;
The phantom read still happens because FOR SHARE only acquires locks on the rows already in the table. It does not prevent new rows from being inserted into the table.
Skipped Modification
Row level locks don’t help here. The UPDATE and DELETE statements already grab row level write locks on the rows being modified. FOR SHARE and FOR UPDATE cannot be used since no SELECT statement is involved in a skipped modification.
Serialization Anomalies
Row level locks do not help here for the same reason they don’t help phantom reads. There’s nothing to prevent the INSERT statements from inserting the new rows.
Overall row level locks are a easy way of handling two of the isolation issues that come up. Fortunately, those two are the ones that most commonly come up. The downside of row level locks, is that while easy, they do add a fair amount of complexity and overhead. You now have to worry about one transaction holding a lock for too long, which you mostly wouldn’t have to worry about otherwise. Next, we’ll take a look at a different approach to solving some of the interactions – the other transaction isolation levels.
This blog is great. Have you considered adding the RSS feed to https://planet.postgresql.org/ ?
I’ve thought about doing so. The reason I have not is that the volume of posts on this blog is much higher than other blogs. It looks like Bruce Momjian currently has the most posts on Planet Postgres with 16. For comparison I’ve written 28 posts in the last 6 weeks. I have considered creating a separate category just for Planet Postgres and posting one post there a week, but I haven’t bothered setting it up.
You forgot to add FOR SHARE in the select statement for Lost Updates example.
Fixed. Thanks!