How Not to Go Bankrupt (and Look Foolish): Mastering Transactions in MySQL
How do we know that data in a database can be safely accessed and modified, and that concurrent actions won't break our data's integrity and introduce corruption? This question is fundamental to building safe applications—and, as we’ll see, to achieving high performance.
Take this example: John and Mary operate a store for the underserved hero community. Mary wants to apply a senior discount to eligible clients and does so inside a transaction, expecting it will prevent interference from concurrent activity. Meanwhile, John realizes Batman is only 30 and updates his age.
Mary> BEGIN;
Mary> SELECT * FROM heroes;
name | age | comment
'Batman' | 65 | ''
John> SELECT * FROM HEROES;
'Batman' | 65 | ''
John> UPDATE heroes SET age = 30 WHERE name = 'Batman';
1 row affected
Mary is a bit anxious, so she double-checks Batman's age:
Mary> SELECT * FROM heroes;
'Batman' | 65
Mary> UPDATE heroes SET comment = 'senior discount' WHERE name = 'Batman';
1 row affected
Mary> COMMIT;
Now, a question to you, reader: what is the final state of the heroes table? Assuming the default MySQL parameters, the table now looks like this:
name | age | comment
'Batman' | 30 | 'senior discount'
This is not at all the result that Mary wanted. What went wrong here? This is what this presentation is all about. We will see how to identify problems that can occur with concurrent transactions, why they occur, how to prevent them, and even how to allow some specific interference as a tradeoff for better performance if our application-specific data integrity constraints can allow it.
One of the main roles of databases is to ensure data integrity in the face of concurrent data access. In the simplest case, a simplistic database could use a single readers-writer lock, queuing all writers until all readers have finished, and vice versa. But this is hardly sufficient for many applications, where multiple clients might be updating and reading account balances, plane reservations, or other critical data, all at once.
MySQL (and its default storage engine, InnoDB) handles this tradeoff by making use of configurable transaction isolation levels. To understand how these work and can be used to balance concurrency guarantees with performance, it's important to know how they're implemented. This is why we'll look at locking, Multi-Version Concurrency Control (MVCC), consistent nonlocking reads, read phenomena, and the four transaction isolation levels, with the help of numerous examples.
By the end, you'll be able to identify and fix concurrency issues in MySQL queries, prevent data corruption in critical operations, and make informed decisions about performance tradeoffs in transactions.