Taming SQL Isolation Levels (Part 1: Read Anomalies & Fixes)
When discussing SQL database transactions, ACID properties are typically the first concept that comes to mind (or ORM, for those seeking abstraction).
Transactions and ACID properties together provide developers with guarantees and handle much of the complexity, such as enabling rollbacks for sets of operations (Atomicity), ensuring data persistence (Durability), and maintaining database constraints (Consistency).
And as with every abstraction, we need to understand its working, its quirks, and most importantly the cost that comes with it.
In my view, isolation (I) is an interesting one, especially for data manipulation in concurrent environments. And it’s up to us as developers to set what isolation level we need, that’s why it’s important to understand it more than the other letters (no discrimination intended, without the other letters I wouldn’t be able to write this article).
What Are Database Isolation Levels? A Quick Guide
Isolation guarantees apply to concurrent transactions, particularly those that manipulate the same database resources (tables or rows in SQL databases). Isolation dictates what’s visible and what’s not from other ongoing transactions.
Contrary to Atomicity, Consistency, and Durability, Isolation is non-binary, hence, the levels of isolation.
In the SQL Standard (ISO/IEC 9075) we find four isolation levels, ordered from loose to strict:
- READ UNCOMMITTED
- READ COMMITTED
- REPEATABLE READ
- SERIALIZABLE
These levels are the standard ones, database vendors don’t implement all of them, for example in SQLite we only have READ UNCOMMITTED and SERIALIZABLE and in PostgreSQL READ UNCOMMITTED acts like READ COMMITTED, so to explore each level we need to use two different databases.
Real-World Example: Building a Room Booking System
To demonstrate isolation issues, let’s consider a room booking service with two primary use cases:
- Use case 1: A user can rent a room and be invoiced.
- Use case 2: Get an inventory of the rooms booked and the invoices issued.
The Database Schema
The database schema is pretty simple:
Four tables, one to hold user data, one to hold rooms, one to hold Invoices, and the last to hold bookings.
How Bookings Work
To book a room, we need three pieces of information:
- The
userId
- The
roomId
- The date, start time, and end time of the booking.
The booking process would be:
- Start a transaction
- Check if the user exists
- Check if the room exists
- Book the room
- Invoice the user
- End Transaction
The interaction with the database is summarized in the following diagram:
The Inventory Problem
Now, to have the inventory, we need to get all Bookings and all Invoices and return a result. The interaction is the following:
The inventory is just a check on the current state of the business, by getting all the bookings and the invoices.
Investigating Isolation Issues
Problem 1: Dirty Reads with READ UNCOMMITTED
For this first scenario, I will use the loosest isolation level, READ UNCOMMITTED. We will try to identify the problems that can occur and their solutions.
READ UNCOMMITTED transactions, as the name suggests, will read values from other transactions even if they didn’t commit yet. For example, in the following interaction, the inventory transaction will read an intermediary state:
In the scenario above, the inventory transaction will have a single invoice and no booking. It shows the system in an inconsistent state, this phenomenon is called a “dirty read”.
In some scenarios, this might be acceptable, but in others, it may cause troubles and false positives in the business - we invoiced a user without booking their room.
Theoretically, there is also another issue that we might encounter with READ UNCOMMITTED called “dirty write”. Similar to the dirty read, dirty writes happen when transactions step on each other and override each other’s values even before they commit. At the end of the transactions, we can’t know the state of the database.
I say theoretically because some vendors prevent it even if you run the transaction with READ UNCOMMITTED isolation.
Testing READ UNCOMMITTED in Action
To implement READ UNCOMMITTED, I had to use SQLite and activate it using:
PRAGMA read_uncommitted = ON;
In the response to the inventory, I added a boolean to indicate if the inventory is consistent (number of bookings equals number of invoices) to detect dirty reads.
To simulate parallel requests, I used k6 and its checks feature to verify inventory consistency.
The script to book rooms simulates 100 users booking rooms for 70 seconds:
import http from 'k6/http';
// k6 run .\book_lt.js
export const options = {
stages: [
{duration: '20s', target: 100},
{duration: '30s', target: 100},
{duration: '20s', target: 0},
],
};
export default function () {
let url = 'http://localhost:5130/v1/rooms/bookings';
let body = JSON.stringify({
"userId": "9A841211-47E1-4DE1-8628-9EFB9E811162",
"roomId": "CEE3BBCD-BD83-4175-B30B-8233D26FFDDF",
"date": "2024-10-26",
"start": "08:00:00",
"end": "10:00:00"
});
const params = {
headers: {
'Content-Type': 'application/json',
},
};
let response = http.post(url, body, params);
}
The script to get the inventory simulates a single user querying the inventory for 70 seconds:
import http from 'k6/http';
import {check } from 'k6';
// k6 run --address "localhost:3000" .\inventory_lt.js
export const options = {
stages: [
{ duration: '1s', target: 1 },
{ duration: '69s', target: 1 },
],
};
export default function () {
let res = http.get('https://localhost:7191/v1/inventory');
check(res, {
'is valid inventory': (r) => r.body.includes('true')
})
}
While this benchmark doesn’t follow all standard benchmarking practices, the results provide valuable insights:
- The booking load test results:
Number of requests | Failed Requests | Avg request duration | p(90) request duration | median duration | number of users | total test duration |
---|---|---|---|---|---|---|
23288 | 0 | 217.2ms | 590.29ms | 48.55ms | 100 | 1min 10s |
- The inventory load test results:
Number of requests | Failed Requests | Avg request duration | p(90) request duration | median duration | number of users | total test duration |
---|---|---|---|---|---|---|
71 | 0 | 983.28ms | 2.5s | 681.24ms | 1 | 1min 10s |
Of the 71 inventory requests, only 33% showed valid inventory (24 correct vs 47 incorrect).
Solving Isolation Problems
Solution 1: Moving to READ COMMITTED
From a business perspective, intermediate inconsistent results may be acceptable in some cases. Alternatively, inventory operations can be scheduled outside business hours to minimize concurrent write operations.
For technical solutions, we can upgrade our isolation level to READ COMMITTED.
Why READ COMMITTED Isn’t Enough
Since SQLite doesn’t support READ COMMITTED, we’ll switch to PostgreSQL. Note that this means we cannot directly compare results with the previous run due to the different environment.
After switching the isolation level and running the same scenario:
- The booking load test results:
Number of requests | Failed Requests | Avg request duration | p(90) request duration | median duration | number of users | total test duration |
---|---|---|---|---|---|---|
36902 | 0 | 135.54ms | 223.51ms | 130.23ms | 100 | 1min 10s |
- The inventory load test results:
Number of requests | Failed Requests | Avg request duration | p(90) request duration | median duration | number of users | total test duration |
---|---|---|---|---|---|---|
200 | 0 | 340.98ms | 661.54ms | 267.27ms | 1 | 1min 10s |
Surprisingly, of the 200 requests, only 5% showed valid inventory (10 correct vs 190 incorrect).
What happened?
READ COMMITTED doesn’t protect us from another concurrent read anomaly called non-repeatable reads. In our case, this occurs when the inventory transaction reads one part before a booking transaction ends (reading the current committed value) and reads the other part after the booking transaction ends (reading the new committed value).
Let’s suppose the database has 1 invoice and 1 booking before the transactions in the figure occur:
At moments 1 and 2, the committed values differ. At point (1), we have only 1 invoice, while at point (2), we have 2 bookings - creating a discrepancy.
The more concurrent transactions on the same resources, the more likely we are to encounter this inconsistency.
This phenomenon is called read skew or “non-repeatable read” because reading the database again just after point 2 would show the correct state.
While this might not be problematic for user interfaces where a refresh can resolve the inconsistency, it can cause serious issues for operations like backups or report generation.
Solution 2: Using REPEATABLE READ for Consistency
The solution is to upgrade to the next isolation level: REPEATABLE READ or SNAPSHOT isolation (terminology varies by vendor).
With snapshot isolation, the database captures a snapshot of the table states when the transaction begins, and all reads within that transaction use that snapshot. Rather than locking tables, PostgreSQL implements Multi-Version Concurrency Control (MVCC). This approach is advantageous since reading operations never block writing operations and vice versa.
Using snapshot isolation, we get:
- The booking load test results:
Number of requests | Failed Requests | Avg request duration | p(90) request duration | median duration | number of users | total test duration |
---|---|---|---|---|---|---|
37241 | 0 | 134.28ms | 249.13ms | 122.83ms | 100 | 1min 10s |
- The inventory load test results:
Number of requests | Failed Requests | Avg request duration | p(90) request duration | median duration | number of users | total test duration |
---|---|---|---|---|---|---|
193 | 0 | 354.25ms | 645.43ms | 332.06ms | 1 | 1min 10s |
Of the 193 requests, 100% showed valid inventory.
Next Steps: Write Conflicts and SERIALIZABLE Isolation
While this resolves our immediate concerns, it’s worth noting that our booking system doesn’t check for overlapping bookings. In a real-world scenario, we would need to implement such verification, which introduces additional challenges.
The problems we’ve solved here arise from write transactions interfering with read transactions. In the next article, we’ll examine issues that occur when write transactions interfere with each other.
Resources and Further Reading
We’ve explored various issues that can arise from concurrent access to database resources, particularly how read transactions can receive inconsistent results due to parallel write transactions. We analyzed their frequency of occurrence and potential business impact. Finally, we examined how different isolation levels in SQLite and PostgreSQL can help solve these problems.
In the next part, we will discuss write/write transaction concurrency issues and explore solutions using additional isolation levels.
All code used in for this article is available in my GitHub repository