Segments clean-up when messages expire due to TTL #14524

Sep 9, 2025

vaimer
Sep 9, 2025

Describe the bug

RabbitMQ version 4.1.3

raft.wal_max_size_bytes = 64000000
raft.segment_max_entries = 4096

Set-up:

fair.retryTest.1 durable quorum queue with TLL = 16000, after expiration, messages are moved to the deadletter-fair-test durable quorum queue by routing key fair-test.

Issue:

Segments for fair.retryTest.1 aren't cleaned up after moving messages, which causes disk memory leaks.

In some cases, segments might be cleaned up only by queue removal.

Purging messages from deadletter-fair-test doesn't clean up the whole consumed memory.

The same behaviour is not reproducible on the 3.11.3 version.

Reproduction steps

Create a queue with TTL, set up a route to another queue
Publish a message to the queue with TTL
Segments for the queue with TTL aren't cleaned up after the message moves to another queue. (can persist for weeks)

Expected behavior

Segments are cleaned up after moving messages to another queue. (same behaviour as acknowledge)

Additional context

First test case

Publishing messages, approximate size of message 300KB

leader for fair.retryTest.1 memory consumption before and after the publishing process

Almost 150 MB is still in use by the queue and its segments

Second test case

Publishing message, approximate size of message is more than 15MB

The same situation as in the first case, with higher memory consumption.

vaimer · Sep 9, 2025

michaelklishin
Sep 9, 2025
Maintainer

@vaimer TTL is a massive can of worms and a set of completely different code paths. Some major difference with how expiration is not "an equivalent of acknowledgement" has been documented for years: queues are discarded due to expiration only when they reach the head of the queue, which implies an active consumer.

There is also documentation on quorum queue tuning for large messages.

4.1.0 made segment cleanup significantly more aggressive for the typical case where messages are consumed. I can see how slow segment cleanup in that case could be considered a bug but message expiration per TTL is a completely different beast and I somewhat doubt that our team will get to it before 4.4 or so. It's simply not a common enough operational problem (the common one was addressed in 4.1.0).

That said, RabbitMQ is open source software and you are welcome to investigate what can be done, convince our team that a specific approach is worth its downsides, and submit a PR.

2 replies

vaimer Sep 10, 2025
Author

Hi @michaelklishin thank you for the answer!

We noticed the issue once we migrated from 3.11.3 to 4.1.3. And it was unexpected behaviour for us, which wasn't mentioned anywhere.

Could you please give some advice/answers to the next questions? It will help us investigate further and find an optimal solution

What I found in 3.11.3 was a configuration - queue_explicit_gc_run_operation_threshold. And I suspect, because of it, segments were cleaned up well in the previous version.
I haven't found it in the docs for the new version. Do we have something similar in the new version to force GC?

4.1.0 made segment cleanup significantly more aggressive for the typical case where messages are consumed.

As you said, acknowledgement of the message will work better. We can replace the message TTL queue config with a consumer which will move messages according to their time in the queue, acknowledge them and move to another queue. What's bothering is that there is delivery limit for messages. Did I get it right that we can still set up such queues with unlimited delivery limits? The problem is that it may consume disk space, but since it will be acknowledged in the end, it should be cleaned up.

There is also documentation on quorum queue tuning for large messages.

Yep, I am currently testing different configurations. Just for clarification, did I get it correctly that a lower segment_max_entries number causes the creation of more segment files. And large number of segment files will also trigger clean-up more frequently.
But in general, it should consume more than wal_max_size_bytes.

Thank you very much for help!

vaimer Sep 10, 2025
Author

Just a bit more context.
We use queues with TTL for the retry mechanism. Once a message is discarded by an ordinary consumer, it comes to the retry queue with a TTL. After TTL expiration, it tries for more time to be acknowledged in another queue.
And there is a chain of queues with different TTLs.

So there is a trend of memory leaks.

vaimer · Sep 10, 2025

kjnilsson
Sep 10, 2025
Maintainer

@vaimer

could you run rabbitmqctl eval "persistent_term:put(quorum_queue_checkpoint_config, {500, 256, 666667})." on each of your rabbitmq nodes and rerun your tests?

4 replies

vaimer Sep 11, 2025
Author

@kjnilsson Hi!

Thank you for the suggestion.
I made a bit bigger test, it is 2k messages and a chain of queues with different TTL

It is better with your suggestion, but still memory consumption is fast.
Is it possible to set such values through Rabbit configuration?

Configuration	Start processing	End processing	~~Returning memory~~ Memory after all messages come to deadletter in the end
Default settings	9.46 GB	4.63 GB	8.56 GB
rabbitmqctl eval "persistent_term:put(quorum_queue_checkpoint_config, {500, 256, 666667})."	8.35 GB	5.52 GB	7.83 GB

Screenshots from the second test

michaelklishin Sep 11, 2025
Maintainer

@vaimer quorum_queue_checkpoint_config is not exposed to rabbitmq.conf and is not really configurable (without rabbitmqctl eval) at all, the default values it uses are constants:

-define(CHECK_MIN_INTERVAL_MS, 1000).
-define(CHECK_MIN_INDEXES, 4096 * 2).
-define(CHECK_MAX_INDEXES, 666_667).

They can potentially be exposed (this is not a promise of delivery) to rabbitmq.conf but there hasn't been any need for that in over seven years of quorum queue existence.

If you want the minimum memory footprint, use a stream or superstream. Minimum memory footprint with this TTL-based workload with large messages of at least 15 MiB is not what quorum queues are optimized for, and that is very unlikely to change. However, streams keep significantly less data in memory and TTL is not an edge case scenario but rather a fundamental feature of streams.

I'm afraid I do not understand what the metrics you are comparing mean specifically. The first two probably mean the memory footprint at the beginning and the end of each test but I'm not sure what "Returning memory" is.

So, give a stream with a comparable data retention policy a try. Or store those messages in a blob store with a TTL and pass around their blob store keys in the messages flowing through the queues (or streams), which is a standard recommendation for messages of tens of MiBs in size or more.

vaimer Sep 12, 2025
Author

Sorry for the confusion, I missed it during copying from my internal notes.

It is a memory once all messages end up in dead letter queue. You can see this step in the provided graphic. Memory jumped to this value, right after processing is done.

I will check out your recommendation, thank you!

kjnilsson Sep 15, 2025
Maintainer

if you keep publishing some more message will it eventually reclaim?

Search code, repositories, users, issues, pull requests...

Segments clean-up when messages expire due to TTL #14524

Uh oh!

vaimer Sep 9, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 2 comments · 6 replies

Uh oh!

michaelklishin Sep 9, 2025 Maintainer

Uh oh!

Uh oh!

vaimer Sep 10, 2025 Author

Uh oh!

vaimer Sep 10, 2025 Author

Uh oh!

Uh oh!

kjnilsson Sep 10, 2025 Maintainer

Uh oh!

Uh oh!

vaimer Sep 11, 2025 Author

Uh oh!

Uh oh!

michaelklishin Sep 11, 2025 Maintainer

Uh oh!

vaimer Sep 12, 2025 Author

Uh oh!

kjnilsson Sep 15, 2025 Maintainer

vaimer
Sep 9, 2025

michaelklishin
Sep 9, 2025
Maintainer

vaimer Sep 10, 2025
Author

vaimer Sep 10, 2025
Author

kjnilsson
Sep 10, 2025
Maintainer

vaimer Sep 11, 2025
Author

michaelklishin Sep 11, 2025
Maintainer

vaimer Sep 12, 2025
Author

kjnilsson Sep 15, 2025
Maintainer