-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[FrameworkBundle] Removed RejectRedeliveredMessageMiddleware
to avoid message to be lost if it cannot be processed by both handler and failed transport
#40249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 7.3
Are you sure you want to change the base?
Conversation
06ed105
to
f2395f0
Compare
…ost if it cannot be processed by both handler and failed transport
f2395f0
to
354ba2a
Compare
Hey @theravel! I understand your opinion on this - it makes sense to me. However, I believe there IS a use case for A) Message is received and passed to the handler For a very long message, eventually all of your workers could get caught trying to handle the same message, leaving no workers available to handle other messages (this is especially a problem because when Rabbit redelivers, it puts the message at the front of the queue). So if I understand the situation fully, we have 2 different scenarios:
Do you have a thought about how to handle this? Is there way that could satisfy everyone? Thanks! |
@weaverryan thank you for looking.
So the use-case with redelivery while another consumer is processing a message should not be valid. Looking at page So it is still not clear why messages should be lost on purpose. |
Hey @theravel!
We need to be really sure on this. From the original issue about this #32055:
But perhaps the problem is not quite what I was describing... but that once a long message is done, the connection to Rabbit is already lost, so it fails to
Hmm. So one original issue that To track if the message was actually redelivered, we could have the Cheers! |
IMHO, loosing message is unacceptable. People can deal with long running process issue. Either at Infra level (defining the right TTL before AMQP re-send the message), or at application level (with Lock, or making the application resilient to multi delivery). But they have no way to workaround a messages lost. |
Hi @weaverryan
In my opinion if dispatching of failed event failed (so we are not sure that failed message has been delivered to failed transport), we should not reject message from original queue. Compromise should be taken: we can not make two actions transactional (remove original message AND put it into failed transport - just because there might be two storage solutions involved), so we rather duplicate messages in failed transport but do not lose data. WDYT? |
@jderusse Ok then, let's get this taken care of. Unfortunately, I think (for some very good reasons) @Tobion is less available these days to jump in with his thoughts.
@theravel That is what I was thinking. But with one addition: $failedEvent = new WorkerMessageFailedEvent($envelope, $transportName, $throwable);
try {
$this->dispatchEvent($failedEvent);
} catch (\Throwable $e) {
if (!$event->wasMessageRedelivered()) {
// we would do nothing, allow Rabbit to naturally redeliver
// hmm... unless there is no "retry" configured. In that case, this is expected
// we would need to differentiate that
// perhaps we combine this with $event->willRetry()
// or we call the method $event->wasMessageRedeliveredIfDesired();
// SendFailedMessageForRetryListener would set this IF it was redelivered or shouldn't have been
} else {
// here is where we would "reject" the original, because we know the message was properly redelivered
}
} This I'm also trying not to make things "too fancy". The best scenario is to be as close to "industry standard" as possible. With my proposed setup, would we basically be saying:
Thoughts? |
@weaverryan why do you think that we can reject a message if it was redelivered? Redelivery doesn't mean that it was successfully handled, so rejecting this message would cause data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any alternative suggestion how to solve the problem that the current solution fixes.
Currently this PR just removes everything which is not acceptable.
If you don't agree with this solution, you could just remove the middleware messenger.middleware.reject_redelivered_message_middleware in your app.
@Tobion can you please provide more information about the original problem, which the middleware is supposed to solve? This would help a lot with the fix. Removing the middleware from specific application will not solve the problem: since middleware is enabled by default, engineers (users of the library) just do not know about it, and they do not expect reliable Symfony library to lose messages. Middleware should be either removed or disabled by default (so it must be educated decision to enable it). |
@Tobion would the following solution work from you perspective?
|
… to be lost if it cannot be processed by both handler and failed transport" This reverts commit 354ba2a.
…is not configured. Otherwise redelivery can happen only when failed transport is not available and message cannot be lost
@Tobion please check one more version, I implemented it as described in the previous comment. If this works, I will squash commits (for now it might be easier for review). |
Correct me if Im wrong. We are discussing the scenario where you can loose messages if you database is down AND if you failure queue is down. These are two catastrophic things that should never happen, little less at the same time. It seams like you have an infrastructure problem and not a problem with the application logic. I vote 👎 to make any changes in 4.4. If this problem should be handled, it will be a new feature that goes in 5.x. But Im currently very hesitant to make any changes for this scenario. |
@Nyholm not entirely so. Scenario is:
Current behaviour is: message will be forcefully removed from queue, even though it will be completely lost. So single component (DB) goes down, and you lose potentially critical data. This is not intended behaviour when you build resilient software. What are we supposed to do if message cannot be handled by handler and by failure transport? My opinion is that it should stay in queue until infrastructure is back to normal. Whether it should go to 4.4: I agree, it would be safe to make it in 5.x. |
Any updates on this? It would be great to have reliable messaging layer in Symfony 5.3 |
I still dont agree with your scenario. If you use RabbitMQ/SQS as a failure queue, then this problem is non existent. You are trying to recover for that scenario where both your failure queue and database is down. The fact that you are putting your failure queue in the database is an infrastructure decision you have made. If we were about to merge this PR. How would we solve the problem with infinite retries that blocks valid messages? Ie, MessageA always fails but MessageB would pass, but it never gets a chance because the worker keeps trying MessageA. See from class doc of
|
What I am trying to make clear is that failure transport is currently not reliable in Symfony messenger because of this middleware. It is up to me as a developer to decide where I want to store my failed messages: in the same RabbitMQ/SQS/whatever instance, or external storage. This is general-purpose framework library which should be applicable in variety of configurations. Not to mention that documentation shows default example with main transport as AMQP and failure transport as DB. Otherwise we should clearly declare in documentation that reliability is not guaranteed except specific configurations (default example configuration does not guarantee it for example). As an example let's consider scenario with any other storage except DB. For example my application consumes messages from different sources (PubSub, SQS, RabbitMQ, etc.), but I want to have unified way of working with failed messages, therefore I declare my failure transport as RabbitMQ queue (DB is not involved at all). If I were to merge this MR, the problem of infinite redelivery loops should be solved the following way. A developer would have a choice:
This problem is similar to at-least-once vs at-most-once, you have to make a compromise. |
Hm. What you fail to understand is that the fact that your database is down is an extreme thing. That should be your biggest issue. You will lose data if your database is down. Same thing if other critical part of your infrastructure is unavailable. I am 👎 on the current patch in this pr. Could this be solved in a different way? Ie, we ack the message only after we successfully written it to the fail queue? |
Sorry, but when you live in distributed applications world, outages of databases, network, microservices, etc. is a normal thing. One should not hope for everything to be okay, one should design application in a resilient way. |
I understand your point of view. Do you have any comments to this question:
|
Hi @Nyholm I have taken few days to think, but I do not see alternative solutions unfortunately. When message should be written to failure transport (e.g. exceeded max retries, or retires were disabled), we should definitely acknowledge it only after write has been successful. This is exactly what my pull request is doing: when Middleware is disabled, this block is disabled https://github.com/symfony/symfony/blob/5.x/src/Symfony/Component/Messenger/Worker.php#L133-L138 |
What should we do to make this reliability fix in Symfony Messenger? |
RejectRedeliveredMessageMiddleware
to avoid message to be l…RejectRedeliveredMessageMiddleware
to avoid message to be lost if it cannot be processed by both handler and failed transport
RejectRedeliveredMessageMiddleware
to avoid message to be lost if it cannot be processed by both handler and failed transportRejectRedeliveredMessageMiddleware
to avoid message to be lost if it cannot be processed by both handler and failed transport
Hi, I'm also considerating this issue. We are moving from perl daemon to php daemon with messenger and this issue is a big problem. Have you news about it ? Thank you. |
I agree with @theravel . I also encountered a similar problem. Example: I need to set up integration with an EXTERNAL queue that does not use PHP serialization but exchanges messages in json format, for this I write my own serializer following the example of https://symfonycasts.com/screencast/messenger/transport-serializer This example is not working, because the native serializer cannot cope with the serialization task due to Closure.
I decided to discard stamps from the message to disable the error transport logic. But there remains a problem with If a message with
Then I’ll choose option 1, I’d rather deal with the problem manually than silently lose data. And yes, if you use web ui rabbitmq and use it to read a message from the queue, it will return to the queue with the |
This pull-request fixes situation described in #36870