[Messenger] Bad performances with the Doctrine/MySQL transport

Symfony version(s) affected

5.4.13 (but I guess it happens with 6.* as well as the code is the same)

Description

We have the following setup

MySQL 8.0.15 with an InnoDb engine and READ-COMMITTED as transaction isolation level
4 Ubuntu-based workers running supervisor which runs 2 processes
Command to consume: php bin/console messenger:consume async_email --limit=100 --time-limit=300 --memory-limit=480M

We experienced a lag of up to ~80k messages in the queue because workers spend most of their time waiting to lock a message to process.
A colleague tried to increase the number of processes from 8 to 80 but it made things worst.
Our processing tops at 3 or 4 consumed messages per seconds on this specific queue

Here are my findings so far:

The MySQL command SHOW FULL PROCESSLIST shows a lot of the SELECT ... FOR UPDATE requests are in the Sending data state.

The SELECT ... FOR UPDATE uses the index to find and lock a message:

EXPLAIN SELECT m.id FROM messenger_email m WHERE (m.delivered_at is null OR m.delivered_at < '2022-10-10 20:16:29') AND (m.available_at <= '2022-10-10 21:16:29') AND (m.queue_name = 'email') ORDER BY available_at ASC LIMIT 1 FOR UPDATE

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	m	NULL	range	IDX_DC4C2A4016BA31DB,IDX_DC4C2A40E3BD61CE,IDX_DC4C2A40FB7336F0	IDX_DC4C2A40E3BD61CE	5	NULL	37451	25.00	Using index condition; Using where

IDX_DC4C2A40E3BD61CE is on the available_at column

Because it uses an index and we are using READ-COMMITTED as the transaction isolation level, then the following happens:

If the WHERE condition includes an indexed column, and InnoDB uses the index, only the indexed column is considered when taking and retaining record locks. In the following example, the first UPDATE takes and retains an x-lock on each row where b = 2. The second UPDATE blocks when it tries to acquire x-locks on the same records, as it also uses the index defined on column b.

CREATE TABLE t (a INT NOT NULL, b INT, c INT, INDEX (b)) ENGINE = InnoDB;
INSERT INTO t VALUES (1,2,3),(2,2,4);
COMMIT;

# Session A
START TRANSACTION;
UPDATE t SET b = 3 WHERE b = 2 AND c = 3;

# Session B
UPDATE t SET b = 4 WHERE b = 2 AND c = 4;

Source: https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html

So the bottleneck in my case happens when workers search for a message to lock because the query is blocking.
Once locked, the message is very very quick to handle.

How to reproduce

I guess it's quite difficult to reproduce as I think there is a balance between:

The number of processes consuming the queue
The time spend to consume one message
The load on the MySQL server

Possible Solution

Solution 1 - Remove the indexes

As suggested by #42868 removing the indexes actually removes the blockings issue as per MySQL documentation because without indexes:

If READ COMMITTED is used instead, the first UPDATE acquires an x-lock on each row that it reads and releases those for rows that it does not modify:

x-lock(1,2); unlock(1,2)
x-lock(2,3); update(2,3) to (2,5); retain x-lock
x-lock(3,2); unlock(3,2)
x-lock(4,3); update(4,3) to (4,5); retain x-lock
x-lock(5,2); unlock(5,2)

For the second UPDATE, InnoDB does a “semi-consistent” read, returning the latest committed version of each row that it reads to MySQL so that MySQL can determine whether the row matches the WHERE condition of the UPDATE:

x-lock(1,2); update(1,2) to (1,4); retain x-lock
x-lock(2,3); unlock(2,3)
x-lock(3,2); update(3,2) to (3,4); retain x-lock
x-lock(4,3); unlock(4,3)
x-lock(5,2); update(5,2) to (5,4); retain x-lock

However, when there are a lot of messages, it leads to slow queries because MySQL uses a "filesort" because of the ORDER BY available_at clause.
I think this clause could be removed if losing the FIFO-like strategy is acceptable

Solution 2 - Use SKIP LOCKED

MySQL has a SKIP LOCKED clause that could be used here: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html

A locking read that uses SKIP LOCKED never waits to acquire a row lock. The query executes immediately, removing locked rows from the result set.

I measured these performances for the SELECT query:

Without FOR UPDATE & SKIP LOCKED: about 0.002 seconds to return a row
With FOR UPDATE & without SKIP LOCKED: about 2 seconds to return a row
With FOR UPDATE& with SKIP LOCKED: about 0.2 seconds to return a row

=> x10 faster with the SKIP LOCKED option

As far as I know, doctrine/dbal doesn't support the SKIP LOCKED option.
An issue has been around for years about it with doctrine/orm: doctrine/orm#7746

I suggest here that the Doctrine transport "manually" adds the SKIP LOCKED option when the database platform is MySQL 8.0 or later (maybe others that support it)

This is my preferred solution

Additional Context

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Messenger] Bad performances with the Doctrine/MySQL transport #47828

Symfony version(s) affected

Description

How to reproduce

Possible Solution

Solution 1 - Remove the indexes

Solution 2 - Use SKIP LOCKED

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

[Messenger] Bad performances with the Doctrine/MySQL transport #47828

Description

Symfony version(s) affected

Description

How to reproduce

Possible Solution

Solution 1 - Remove the indexes

Solution 2 - Use SKIP LOCKED

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions