Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

psihius
Copy link

@psihius psihius commented Oct 5, 2025

Q A
Branch? 7.4
Bug fix? yes
New feature? no
Deprecations? no
Issues #47633 (and many others since closed), abandoned PR #60207 and so on
License MIT

We run over 3 million queue items a day, we had run into major issues with current implementation deadlocking regularly, no amount of adjusting the purge threads and other settings did fix the root case - the messenger_messages table not having a proper covering index for the SELECT FOR UPDATE query.
Because MySQL implementation has been special cased to batch delete's by delivered_at having a special value, at least in MySQL 8.0.* and up (we run 8.0.42 and now running 8.4.6) this results in row range locks that basically lock the whole table due to delivered_at index being of extremely low cardinality, resulting in locking of all the rows that delivered_at is at null value.
Then UPDATE queries try to update delivered_at and delete is run by delivered_at condition, resulting in eventual deadlock.

At out scale this lead to deadlocks completelly overwhelming the server within an hour and hard-locking it to a point we had to kill -9 <mysql pid>, even running very agressive deadlock timeouts doesn't help.
Our machine for the database has plenty of resources and ram free, so it never was a CPU, RAM or I/O issue - server barelly uses over 15% of the CPU, innodb buffer is only 40% full so everything fits into memory. I/O never rose above 3%, mostly sitting bellow 1% (we have InnoDB io capacity set at 6000 baseline and 12000 peak, which is only a fraction of what the storage layer is capable of).

Adding covering index delivered_at, id does help to aliviate the onset of the issue, but still resulted in hard dealocks, just took about 14-16 hours under our workloads.
I was unable to find the original reasons why delete batching was added, but I suspect that's some MySQL 4/5 era schenanigans that are outdated and not true any more.

So this PR is what I have deployed 6 days ago to our production enviroment and it has been running trouble free since then without a single deadlock recorded against messenger compoment table. Collecting statistics also shows that this is the correct way to solve this, here are performance schema queries that show before and after:
I removed all batched handling and let MySQL run the same way all other databases do it, which works like a charm if we also add a proper index of queue_name + avaiable_at + delivered_at + id - this allows MySQL to lock only the specificly required row by it's primary id, removing all lock contention issues (the id field in the index is need, that's what gives index the cardinality to do the job right).

Before, notice average lock ms column, it is bad.

mysql> SELECT DIGEST_TEXT,
    ->        COUNT_STAR,
    ->        ROUND(SUM_TIMER_WAIT/1e12,3)  AS total_sec,
    ->        ROUND(SUM_LOCK_TIME/1e12,3)   AS lock_sec,
    ->        ROUND((SUM_LOCK_TIME/1e12)/NULLIF(COUNT_STAR,0)*1000,3) AS avg_lock_ms
    -> FROM performance_schema.events_statements_summary_by_digest
    -> WHERE DIGEST_TEXT LIKE '%MESSENGER_MESSAGES%'
    -> ORDER BY SUM_TIMER_WAIT DESC
    -> LIMIT 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+------------+-------------+
| DIGEST_TEXT                                                                                                                                                                                                                                 | COUNT_STAR | total_sec  | lock_sec   | avg_lock_ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+------------+-------------+
| DELETE FROM `messenger_messages` WHERE `delivered_at` = ?                                                                                                                                                                                   |    2699821 | 126790.694 | 120946.017 |      44.798 |
| UPDATE `messenger_messages` SET `delivered_at` = ? WHERE `id` = ?                                                                                                                                                                           |    3098328 |  43760.777 |  25541.015 |       8.243 |
| SELECT `m` . * FROM `messenger_messages` `m` WHERE ( `m` . `queue_name` = ? ) AND ( `m` . `delivered_at` IS NULL OR `m` . `delivered_at` < ? ) AND ( `m` . `available_at` <= ? ) ORDER BY `available_at` ASC LIMIT ? FOR UPDATE SKIP LOCKED |    2696084 |   4204.948 |      2.202 |       0.001 |
| INSERT INTO `messenger_messages` ( `body` , `headers` , `queue_name` , `created_at` , `available_at` ) VALUES (...)                                                                                                                         |    1552710 |   2470.059 |   1069.126 |       0.689 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------------+------------+-------------+

After

mysql> SELECT DIGEST_TEXT,
    ->        COUNT_STAR,
    ->        ROUND(SUM_TIMER_WAIT/1e12,3)  AS total_sec,
    ->        ROUND(SUM_LOCK_TIME/1e12,3)   AS lock_sec,
    ->        ROUND((SUM_LOCK_TIME/1e12)/NULLIF(COUNT_STAR,0)*1000,3) AS avg_lock_ms
    -> FROM performance_schema.events_statements_summary_by_digest
    -> WHERE DIGEST_TEXT LIKE '%MESSENGER_MESSAGES%'
    -> ORDER BY SUM_TIMER_WAIT DESC
    -> LIMIT 10;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------+----------+-------------+
| DIGEST_TEXT                                                                                                                                                                                                                                 | COUNT_STAR | total_sec | lock_sec | avg_lock_ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------+----------+-------------+
| SELECT `m` . * FROM `messenger_messages` `m` WHERE ( `m` . `queue_name` = ? ) AND ( `m` . `delivered_at` IS NULL OR `m` . `delivered_at` < ? ) AND ( `m` . `available_at` <= ? ) ORDER BY `available_at` ASC LIMIT ? FOR UPDATE SKIP LOCKED |   19002450 | 29151.318 |   22.938 |       0.001 |
| DELETE FROM `messenger_messages` WHERE `id` = ?                                                                                                                                                                                             |   12677551 | 12511.529 |   66.584 |       0.005 |
| INSERT INTO `messenger_messages` ( `body` , `headers` , `queue_name` , `created_at` , `available_at` ) VALUES (...)                                                                                                                         |   12786292 |  2260.588 |   18.044 |       0.001 |
| UPDATE `messenger_messages` SET `delivered_at` = ? WHERE `id` = ?                                                                                                                                                                           |   12865570 |  1689.881 |    7.368 |       0.001 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+-----------+----------+-------------+

I imagine that the same covering index for the select query should have similar results for other databases, as this goes down to basics of indexing columns for database performance, but obviousuly some help with validating would be appriciated.

I also belive this should be backported all the way down to 6.4 branch, as this is an issue I have seen a lot of people running into and common advice being "just use RabbitMQ instead", while the root cause isn't investigated properly. I had the envrioment and authority to dig into root cause and this is the result of that investigation.

@carsonbot carsonbot added this to the 7.4 milestone Oct 5, 2025
@carsonbot carsonbot changed the title [Messenger][Doctrine] Remove old MySQL special handling [Doctrine][Messenger] Remove old MySQL special handling Oct 5, 2025
@psihius psihius force-pushed the messenge-doctrine-transport-mysql-deadlocking-resolution branch from 0df7f25 to 0680360 Compare October 5, 2025 18:30
@psihius psihius force-pushed the messenge-doctrine-transport-mysql-deadlocking-resolution branch from 0680360 to aca998a Compare October 5, 2025 18:31
@psihius psihius changed the title [Doctrine][Messenger] Remove old MySQL special handling [Doctrine][Messenger] Remove old MySQL special handling that causes deadlocks Oct 5, 2025
@symfony symfony deleted a comment from carsonbot Oct 6, 2025
$table->addIndex(['queue_name']);
$table->addIndex(['available_at']);
$table->addIndex(['delivered_at']);
$table->addIndex(['queue_name', 'available_at', 'delivered_at', 'id']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psihius are you sure the ID field is needed?

The select query doesn't use the ID field in the where condition.

Copy link
Author

@psihius psihius Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query does not, but the primary key is used within the index to lock the column, this makes locking more efficient.
Also btree indexes, at least in MySQL/MariaDB, always contain a primary key part as that points to the row, so I just put it there explictly.
Might not needed for MSSQL/Oracle/PosgtreSQL, but I've decided for the moment not to do any platform dependent code, as it might not be needed. Lets see what the maintainers want to do here and adjust accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.