Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Describe the bug

RabbitMQ version 4.1.3

raft.wal_max_size_bytes = 64000000
raft.segment_max_entries = 4096

Set-up:

fair.retryTest.1 durable quorum queue with TLL = 16000, after expiration, messages are moved to the deadletter-fair-test durable quorum queue by routing key fair-test.

Issue:

Segments for fair.retryTest.1 aren't cleaned up after moving messages, which causes disk memory leaks.

In some cases, segments might be cleaned up only by queue removal.

Purging messages from deadletter-fair-test doesn't clean up the whole consumed memory.

The same behaviour is not reproducible on the 3.11.3 version.

Reproduction steps

  1. Create a queue with TTL, set up a route to another queue
  2. Publish a message to the queue with TTL
  3. Segments for the queue with TTL aren't cleaned up after the message moves to another queue. (can persist for weeks)

Expected behavior

Segments are cleaned up after moving messages to another queue. (same behaviour as acknowledge)

Additional context

First test case

Publishing messages, approximate size of message 300KB

Image Image

leader for fair.retryTest.1 memory consumption before and after the publishing process

Image Image

Almost 150 MB is still in use by the queue and its segments

Second test case

Publishing message, approximate size of message is more than 15MB

Image Image

The same situation as in the first case, with higher memory consumption.

You must be logged in to vote

Replies: 2 comments · 6 replies

Comment options

@vaimer TTL is a massive can of worms and a set of completely different code paths. Some major difference with how expiration is not "an equivalent of acknowledgement" has been documented for years: queues are discarded due to expiration only when they reach the head of the queue, which implies an active consumer.

There is also documentation on quorum queue tuning for large messages.

4.1.0 made segment cleanup significantly more aggressive for the typical case where messages are consumed. I can see how slow segment cleanup in that case could be considered a bug but message expiration per TTL is a completely different beast and I somewhat doubt that our team will get to it before 4.4 or so. It's simply not a common enough operational problem (the common one was addressed in 4.1.0).

That said, RabbitMQ is open source software and you are welcome to investigate what can be done, convince our team that a specific approach is worth its downsides, and submit a PR.

You must be logged in to vote
2 replies
@vaimer
Comment options

Hi @michaelklishin thank you for the answer!

We noticed the issue once we migrated from 3.11.3 to 4.1.3. And it was unexpected behaviour for us, which wasn't mentioned anywhere.

Could you please give some advice/answers to the next questions? It will help us investigate further and find an optimal solution

What I found in 3.11.3 was a configuration - queue_explicit_gc_run_operation_threshold. And I suspect, because of it, segments were cleaned up well in the previous version.
I haven't found it in the docs for the new version. Do we have something similar in the new version to force GC?

4.1.0 made segment cleanup significantly more aggressive for the typical case where messages are consumed.

As you said, acknowledgement of the message will work better. We can replace the message TTL queue config with a consumer which will move messages according to their time in the queue, acknowledge them and move to another queue. What's bothering is that there is delivery limit for messages. Did I get it right that we can still set up such queues with unlimited delivery limits? The problem is that it may consume disk space, but since it will be acknowledged in the end, it should be cleaned up.

There is also documentation on quorum queue tuning for large messages.

Yep, I am currently testing different configurations. Just for clarification, did I get it correctly that a lower segment_max_entries number causes the creation of more segment files. And large number of segment files will also trigger clean-up more frequently.
But in general, it should consume more than wal_max_size_bytes.

Thank you very much for help!

@vaimer
Comment options

Just a bit more context.
We use queues with TTL for the retry mechanism. Once a message is discarded by an ordinary consumer, it comes to the retry queue with a TTL. After TTL expiration, it tries for more time to be acknowledged in another queue.
And there is a chain of queues with different TTLs.

So there is a trend of memory leaks.
Screenshot 2025-09-10 at 09 46 44
Screenshot 2025-09-10 at 09 46 56

Comment options

@vaimer

could you run rabbitmqctl eval "persistent_term:put(quorum_queue_checkpoint_config, {500, 256, 666667})." on each of your rabbitmq nodes and rerun your tests?

You must be logged in to vote
4 replies
@vaimer
Comment options

@kjnilsson Hi!

Thank you for the suggestion.
I made a bit bigger test, it is 2k messages and a chain of queues with different TTL

It is better with your suggestion, but still memory consumption is fast.
Is it possible to set such values through Rabbit configuration?

Configuration Start processing End processing Returning memory Memory after all messages come to deadletter in the end
Default settings 9.46 GB 4.63 GB 8.56 GB
rabbitmqctl eval "persistent_term:put(quorum_queue_checkpoint_config, {500, 256, 666667})." 8.35 GB 5.52 GB 7.83 GB

Screenshots from the second test
image-2025-9-11_10-38-0
image-2025-9-11_10-38-23

@michaelklishin
Comment options

@vaimer quorum_queue_checkpoint_config is not exposed to rabbitmq.conf and is not really configurable (without rabbitmqctl eval) at all, the default values it uses are constants:

-define(CHECK_MIN_INTERVAL_MS, 1000).
-define(CHECK_MIN_INDEXES, 4096 * 2).
-define(CHECK_MAX_INDEXES, 666_667).

They can potentially be exposed (this is not a promise of delivery) to rabbitmq.conf but there hasn't been any need for that in over seven years of quorum queue existence.

If you want the minimum memory footprint, use a stream or superstream. Minimum memory footprint with this TTL-based workload with large messages of at least 15 MiB is not what quorum queues are optimized for, and that is very unlikely to change. However, streams keep significantly less data in memory and TTL is not an edge case scenario but rather a fundamental feature of streams.

I'm afraid I do not understand what the metrics you are comparing mean specifically. The first two probably mean the memory footprint at the beginning and the end of each test but I'm not sure what "Returning memory" is.

So, give a stream with a comparable data retention policy a try. Or store those messages in a blob store with a TTL and pass around their blob store keys in the messages flowing through the queues (or streams), which is a standard recommendation for messages of tens of MiBs in size or more.

@vaimer
Comment options

Sorry for the confusion, I missed it during copying from my internal notes.

It is a memory once all messages end up in dead letter queue. You can see this step in the provided graphic. Memory jumped to this value, right after processing is done.

I will check out your recommendation, thank you!

@kjnilsson
Comment options

if you keep publishing some more message will it eventually reclaim?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #14522 on September 09, 2025 21:46.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.