Description
Symfony version(s) affected
5.4.13 (but I guess it happens with 6.* as well as the code is the same)
Description
We have the following setup
- MySQL 8.0.15 with an InnoDb engine and
READ-COMMITTED
as transaction isolation level - 4 Ubuntu-based workers running supervisor which runs 2 processes
- Command to consume:
php bin/console messenger:consume async_email --limit=100 --time-limit=300 --memory-limit=480M
We experienced a lag of up to ~80k messages in the queue because workers spend most of their time waiting to lock a message to process.
A colleague tried to increase the number of processes from 8 to 80 but it made things worst.
Our processing tops at 3 or 4 consumed messages per seconds on this specific queue
Here are my findings so far:
The MySQL command SHOW FULL PROCESSLIST
shows a lot of the SELECT ... FOR UPDATE
requests are in the Sending data
state.
The SELECT ... FOR UPDATE
uses the index to find and lock a message:
EXPLAIN SELECT m.id FROM messenger_email m WHERE (m.delivered_at is null OR m.delivered_at < '2022-10-10 20:16:29') AND (m.available_at <= '2022-10-10 21:16:29') AND (m.queue_name = 'email') ORDER BY available_at ASC LIMIT 1 FOR UPDATE
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | m | NULL | range | IDX_DC4C2A4016BA31DB,IDX_DC4C2A40E3BD61CE,IDX_DC4C2A40FB7336F0 | IDX_DC4C2A40E3BD61CE | 5 | NULL | 37451 | 25.00 | Using index condition; Using where |
IDX_DC4C2A40E3BD61CE is on the available_at
column
Because it uses an index and we are using READ-COMMITTED
as the transaction isolation level, then the following happens:
If the WHERE condition includes an indexed column, and InnoDB uses the index, only the indexed column is considered when taking and retaining record locks. In the following example, the first UPDATE takes and retains an x-lock on each row where b = 2. The second UPDATE blocks when it tries to acquire x-locks on the same records, as it also uses the index defined on column b.
CREATE TABLE t (a INT NOT NULL, b INT, c INT, INDEX (b)) ENGINE = InnoDB;
INSERT INTO t VALUES (1,2,3),(2,2,4);
COMMIT;
# Session A
START TRANSACTION;
UPDATE t SET b = 3 WHERE b = 2 AND c = 3;
# Session B
UPDATE t SET b = 4 WHERE b = 2 AND c = 4;
Source: https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html
So the bottleneck in my case happens when workers search for a message to lock because the query is blocking.
Once locked, the message is very very quick to handle.
How to reproduce
I guess it's quite difficult to reproduce as I think there is a balance between:
- The number of processes consuming the queue
- The time spend to consume one message
- The load on the MySQL server
Possible Solution
Solution 1 - Remove the indexes
As suggested by #42868 removing the indexes actually removes the blockings issue as per MySQL documentation because without indexes:
If READ COMMITTED is used instead, the first UPDATE acquires an x-lock on each row that it reads and releases those for rows that it does not modify:
x-lock(1,2); unlock(1,2)
x-lock(2,3); update(2,3) to (2,5); retain x-lock
x-lock(3,2); unlock(3,2)
x-lock(4,3); update(4,3) to (4,5); retain x-lock
x-lock(5,2); unlock(5,2)
For the second UPDATE, InnoDB does a “semi-consistent” read, returning the latest committed version of each row that it reads to MySQL so that MySQL can determine whether the row matches the WHERE condition of the UPDATE:
x-lock(1,2); update(1,2) to (1,4); retain x-lock
x-lock(2,3); unlock(2,3)
x-lock(3,2); update(3,2) to (3,4); retain x-lock
x-lock(4,3); unlock(4,3)
x-lock(5,2); update(5,2) to (5,4); retain x-lock
However, when there are a lot of messages, it leads to slow queries because MySQL uses a "filesort" because of the ORDER BY available_at
clause.
I think this clause could be removed if losing the FIFO-like strategy is acceptable
Solution 2 - Use SKIP LOCKED
MySQL has a SKIP LOCKED
clause that could be used here: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html
A locking read that uses SKIP LOCKED never waits to acquire a row lock. The query executes immediately, removing locked rows from the result set.
I measured these performances for the SELECT query:
- Without
FOR UPDATE
&SKIP LOCKED
: about 0.002 seconds to return a row - With
FOR UPDATE
& withoutSKIP LOCKED
: about 2 seconds to return a row - With
FOR UPDATE
& withSKIP LOCKED
: about 0.2 seconds to return a row
=> x10 faster with the SKIP LOCKED
option
As far as I know, doctrine/dbal
doesn't support the SKIP LOCKED
option.
An issue has been around for years about it with doctrine/orm
: doctrine/orm#7746
I suggest here that the Doctrine transport "manually" adds the SKIP LOCKED
option when the database platform is MySQL 8.0 or later (maybe others that support it)
This is my preferred solution
Additional Context
No response