You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
linux
RocketMQ version
5.1.x
JDK Version
JDK8
Describe the Bug
After enabling the enableSlaveActingMaster switch, if the master node crashes and scheduled messages are triggered before the route information is updated, the slave may choose the dead master node for send. Combined with the four retry mechanism, this can easily lead to message loss.
for example:
code:
Steps to Reproduce
To increase the probability of reproducing the issue, it's necessary to increase the value of loadBalancePollNameServerInterval.
1.Set up a cluster with 3 masters and 3 slaves, and enable the enableSlaveActingMaster feature.
2.Send 100 scheduled messages to the cluster with message time range between 1 to 3 minutes.
3.Start consumption, and during the consumption process, shut down one of the master nodes.
By comparing the sent messages with the consumed messages, you may encounter message loss and Broker errors.
What Did You Expect to See?
choosing the correct master node ensures that scheduled messages are not lost.
Before Creating the Bug Report
I found a bug, not just asking a question, which should be created in GitHub Discussions.
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
linux
RocketMQ version
5.1.x
JDK Version
JDK8
Describe the Bug
After enabling the enableSlaveActingMaster switch, if the master node crashes and scheduled messages are triggered before the route information is updated, the slave may choose the dead master node for send. Combined with the four retry mechanism, this can easily lead to message loss.


for example:
code:
Steps to Reproduce
To increase the probability of reproducing the issue, it's necessary to increase the value of loadBalancePollNameServerInterval.
1.Set up a cluster with 3 masters and 3 slaves, and enable the enableSlaveActingMaster feature.
2.Send 100 scheduled messages to the cluster with message time range between 1 to 3 minutes.
3.Start consumption, and during the consumption process, shut down one of the master nodes.
By comparing the sent messages with the consumed messages, you may encounter message loss and Broker errors.
What Did You Expect to See?
choosing the correct master node ensures that scheduled messages are not lost.
What Did You See Instead?
exclude the master that are down.
Additional Context
No response