Replies: 1 comment
-
|
Hey @twoholes, thanks for the write-up! The pipe read handler is used to read out the snapshot contents that is being written by the child process over the wire and to the target node. This code is actually the same as what we do during the replication snapshot here: Lines 1778 to 1787 in a571ac5 We can consider adding a bound to the amount of data we will read out from the pipe (for both replication and slot migration) in one event loop cycle. Although in the fullness of time we should really consolidate on the new dual-channel replication protocol. I just filed #2957 for that. With this model, there wouldn't be any resources on the main thread since the child process would send directly to the target node Let me know if this would resolve your concerns |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
During atomic slot migration in Valkey cluster mode, the
slotMigrationPipeReadHandler()function contains an unboundedwhile(1)loop that can monopolize the main thread for extended periods, potentially starving regular client requests. This issue becomes severe when migrating large slots (e.g., 10GB+), where the main thread can be occupied up to 65% of the time.Environment
CLUSTER MIGRATESLOTS)Problem Description
The
slotMigrationPipeReadHandler()function (src/replication.c:1846) uses an unboundedwhile(1)loop to read data from a pipe and write it to the target node. While each individual invocation exits relatively quickly (after ~4 iterations when the pipe is emptied), the event handler is repeatedly triggered thousands of times during large data migrations, leading to cumulative main thread monopolization.Code Analysis
Loop Exit Conditions
The loop exits when:
read()returnsEAGAIN)read()returns 0)write()returns less than requested)However, on high-speed networks, the TCP send buffer rarely fills up, and the loop primarily exits due to pipe emptiness. The child process continuously refills the pipe, causing the event handler to be triggered repeatedly.
Impact Analysis
Quantitative Analysis for 10GB Slot Migration
Impact Table
Root Cause
1. Lack of Iteration Limit
Unlike replication code which has
REPL_MAX_READS_PER_IO_EVENT = 25to prevent starvation (src/networking.c:4109), slot migration has no such protection:The comment explicitly states (src/networking.c:4115):
Slot migration lacks this protection.
2. TCP Buffer Rarely Fills on High-Speed Networks
3. Repeated Event Triggering
Proposed Solutions
Solution 1: Add Iteration Limit (Recommended)
Similar to replication code, add a maximum iteration limit:
Benefits:
Solution 2: Time-Based Limit
Solution 3: Configurable Limit
Add configuration options:
Comparison with Replication Code
Code Locations
src/replication.c:1846-slotMigrationPipeReadHandler()- unbounded loopsrc/networking.c:4109-REPL_MAX_READS_PER_IO_EVENTdefinitionsrc/networking.c:4115- Comment explaining starvation preventionsrc/cluster_migrateslots.c:1596- Pipe creation withO_NONBLOCKReproduction Steps
CLUSTER MIGRATESLOTS <node-id> <slots>Expected Behavior
Client requests should maintain consistent latency during slot migration.
Actual Behavior
Client request latency increases by 1.4x - 2.8x during large slot migrations due to main thread monopolization.
Workarounds
Additional Context
This issue becomes more severe with:
Request
Could the maintainers consider adding an iteration limit similar to the replication code to prevent client starvation during slot migrations? I'm happy to submit a PR if this approach is acceptable.
Thank you for your time and for maintaining this excellent project!
Beta Was this translation helpful? Give feedback.
All reactions