fix(cluster): reconnect to nodes that restart without slot changes by maxbronnikov10 · Pull Request #2096 · redis/ioredis

maxbronnikov10 · Apr 9, 2026

Problem

When a cluster node (typically a replica) restarts without any slot changes,
ioredis permanently removes it from the connection pool and never reconnects.
This happens because retryStrategy is hardcoded to null for every node in
the pool — by design, to force MOVED-based topology refresh. However, MOVED
errors are never triggered when slots don't move, so the node is lost forever.

This causes two downstream issues:

scaleReads: "slave" silently drops replicas, degrading read throughput
With enableOfflineQueue: false, commands fail with
"Cluster isn't ready and enableOfflineQueue options is false" because
getInstanceByKey() returns undefined for the removed node

Common in Kubernetes environments during rolling restarts.

Changes

clusterNodeRetryStrategy — new option (mirrors clusterRetryStrategy)
that controls the retryStrategy for individual cluster node connections.
When set, a node that loses its connection stays in the pool and retries
instead of being permanently removed. If the node is later removed from the
cluster topology via reset() (triggered by MOVED or slotsRefreshInterval),
it is still cleaned up correctly.

new Cluster(nodes, {
  clusterNodeRetryStrategy: (times) => Math.min(100 + times * 2, 2000),
})

Note

Medium Risk
Touches Redis Cluster connection lifecycle and command routing when nodes disconnect, which can affect availability and error behavior under failure conditions. Changes are scoped and covered by new functional/unit tests, but reconnection timing and edge cases warrant review.

Overview
Adds clusterNodeRetryStrategy to ClusterOptions (default null) to control per-node retryStrategy so restarted nodes (often replicas) can reconnect without requiring slot changes.

ConnectionPool now accepts this strategy and applies it when creating node Redis instances, and Cluster passes the option through while also tightening sendCommand rejection/queueing logic when enableOfflineQueue: false and a chosen node is not ready.

Includes new functional tests for node restart/reconnect and MOVED redirection scenarios plus unit tests for ConnectionPool strategy wiring and nodeError emission; also ignores .history in .gitignore.

^{Reviewed by Cursor Bugbot for commit 0e31bf3. Bugbot is set up for automated code reviews on this repo. Configure here.}

When a cluster node (typically a replica) restarts without any slot changes, ioredis permanently removes it from the connection pool and never reconnects. This happens because is hardcoded to for every node in the pool — by design, to force MOVED-based topology refresh. However, MOVED errors are never triggered when slots don't move, so the node is lost forever.

jit-ci · Apr 9, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copilot

Pull request overview

Adds a new Cluster option to allow per-node reconnection retries so cluster nodes (notably replicas) that restart without slot changes aren’t permanently dropped from the pool—improving resiliency in rolling-restart environments.

Changes:

Introduces clusterNodeRetryStrategy (Cluster option) to control retryStrategy for individual node connections in the connection pool.
Updates cluster command dispatch to reject commands (when enableOfflineQueue: false) if the chosen node exists but is not ready (e.g., reconnecting), instead of silently queueing on the node’s offline queue.
Adds unit + functional tests covering default behavior, node retention vs removal, and offline-queue rejection behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`lib/cluster/ClusterOptions.ts`	Adds the new `clusterNodeRetryStrategy` option (and updates typing to allow `clusterRetryStrategy: null`).
`lib/cluster/ConnectionPool.ts`	Applies `clusterNodeRetryStrategy` (when provided) as the per-node `retryStrategy`; otherwise keeps `retryStrategy: null`.
`lib/cluster/index.ts`	Passes `clusterNodeRetryStrategy` into the pool and rejects commands to non-ready nodes when cluster offline queue is disabled.
`test/unit/clusters/ConnectionPool.ts`	Unit tests verifying how `retryStrategy` is set based on `clusterNodeRetryStrategy`.
`test/functional/cluster/node_reconnect.ts`	Functional tests for node removal vs retention, command rejection with `enableOfflineQueue: false`, and reconnect after restart.

Comments suppressed due to low confidence (1)

lib/cluster/ConnectionPool.ts:86

clusterNodeRetryStrategy is being passed into defaults(..., this.redisOptions, ...), which means it will also be copied onto the per-node Redis instance options object (as an unknown option). Consider stripping clusterNodeRetryStrategy out before passing options to new Redis(...) to avoid leaking Cluster-only configuration into Redis options and reduce confusion when inspecting redis.options.

  createRedisFromOptions(node: RedisOptions, readOnly: boolean) {
    const redis = new Redis(
        defaults(
            {
              // By default, never try to reconnect when a node is lost,
              // instead, waiting for a `MOVED` error and fetching slots again.
              // When `clusterNodeRetryStrategy` is set, use it to allow
              // reconnection (e.g. for replica nodes that restart without
              // any slot changes).
              retryStrategy:
                typeof this.redisOptions.clusterNodeRetryStrategy === "function"
                  ? this.redisOptions.clusterNodeRetryStrategy
                  : null,
              // Offline queue should be enabled so that
              // we don't need to wait for the `ready` event
              // before sending commands to the node.
              enableOfflineQueue: true,
              readOnly: readOnly,
            },
            node,
            this.redisOptions,
            { lazyConnect: true }
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

maxbronnikov10 · Apr 9, 2026

@PavelPashov hello! can u take a look on PR please?

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 31ebcd1. Configure here.}

PavelPashov · Apr 11, 2026

Thanks for the PR. I will review it and follow up once I have gone through it.

PavelPashov · Apr 11, 2026

run sharded pub sub tests

uglide · Apr 11, 2026

Testcase	Errors	Failures	Skipped	Total
Root Suite	0	0	0	0
Sharded Pub/Sub E2E - Basic	0	0	0	4
Sharded Pub/Sub E2E - Connection Lifecycle	0	0	0	3
Sharded Pub/Sub E2E - Failure Recovery Multiple Subscribers	0	0	0	4
Sharded Pub/Sub E2E - Failure Recovery Single Subscriber	0	0	0	4

---- Details for maintainers

uglide · Apr 11, 2026

Testcase	Errors	Failures	Skipped	Total
Root Suite	0	0	0	0
Sharded Pub/Sub E2E - Basic	0	0	0	4
Sharded Pub/Sub E2E - Connection Lifecycle	0	0	0	3
Sharded Pub/Sub E2E - Failure Recovery Multiple Subscribers	0	0	0	4
Sharded Pub/Sub E2E - Failure Recovery Single Subscriber	0	0	0	4

---- Details for maintainers

uglide · Apr 11, 2026

Testcase	Failures	Total
Root Suite	0	0
Sharded Pub/Sub E2E - Basic	0	4
Sharded Pub/Sub E2E - Connection Lifecycle	0	3
Sharded Pub/Sub E2E - Failure Recovery Multiple Subscribers	3	4
Sharded Pub/Sub E2E - Failure Recovery Single Subscriber	2	4

---- Details for maintainers

maxbronnikov10 · Apr 15, 2026

Hello! @uglide Thanks for running the tests.

I believe these tests were already flaky before my changes — I've checked and confirmed that this PR doesn't touch any of the related logic.

However, I'm happy to investigate the issue and submit a fix in a separate PR. @PavelPashov what do you think?

Copilot AI review requested due to automatic review settings April 9, 2026 18:59

Copilot started reviewing on behalf of maxbronnikov10 April 9, 2026 19:00 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/cluster/ClusterOptions.ts Outdated

Comment thread lib/cluster/index.ts Outdated

maxbronnikov10 added 2 commits April 9, 2026 22:08

Trigger pipeline

c294aea

Trigger pipeline

fac3eaa

maxbronnikov10 force-pushed the feat/cluster-node-retry-strategy branch from b5d8c27 to fac3eaa Compare April 9, 2026 19:08

maxbronnikov10 requested a review from Copilot April 9, 2026 19:09

Copilot started reviewing on behalf of maxbronnikov10 April 9, 2026 19:09 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/cluster/index.ts Outdated

Comment thread lib/cluster/index.ts Outdated

Comment thread test/functional/cluster/node_reconnect.ts Outdated

review fixes

94cc0a9

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/cluster/index.ts Outdated

review fixes 2

03d7cbd

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/cluster/index.ts

review fixes 3

03e7079

maxbronnikov10 requested a review from Copilot April 9, 2026 20:05

Copilot started reviewing on behalf of maxbronnikov10 April 9, 2026 20:06 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/cluster/index.ts

maxbronnikov10 added 2 commits April 9, 2026 23:18

review fixes 4

633835c

review fixes 5

31ebcd1

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread lib/cluster/index.ts

maxbronnikov10 added 2 commits April 10, 2026 15:59

review fixes

3ccf420

emitting error

0e31bf3

Search code, repositories, users, issues, pull requests...

Conversation

maxbronnikov10 commented Apr 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Uh oh!

jit-ci Bot commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxbronnikov10 commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PavelPashov commented Apr 11, 2026

Uh oh!

PavelPashov commented Apr 11, 2026

Uh oh!

uglide commented Apr 11, 2026

Uh oh!

uglide commented Apr 11, 2026

Uh oh!

uglide commented Apr 11, 2026

Uh oh!

maxbronnikov10 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maxbronnikov10 commented Apr 9, 2026 •

edited by cursor Bot

Loading