Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[https://nvbugs/5969206][fix] BREAKING: Setting default value of KV cache transfer timeout to 60s#12249

Merged
pcastonguay merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
pcastonguay:default_kv_transfer_timeoutpcastonguay/TensorRT-LLM:default_kv_transfer_timeoutCopy head branch name to clipboard
Mar 18, 2026
Merged

[https://nvbugs/5969206][fix] BREAKING: Setting default value of KV cache transfer timeout to 60s#12249
pcastonguay merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
pcastonguay:default_kv_transfer_timeoutpcastonguay/TensorRT-LLM:default_kv_transfer_timeoutCopy head branch name to clipboard

Conversation

@pcastonguay
Copy link
Copy Markdown
Collaborator

@pcastonguay pcastonguay commented Mar 16, 2026

Summary by CodeRabbit

  • Bug Fixes
    • KV cache transfer now defaults to a 60-second timeout, preventing indefinite waiting in transfer scenarios.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@pcastonguay pcastonguay requested a review from Tabrizian March 16, 2026 16:46
@pcastonguay pcastonguay requested a review from a team as a code owner March 16, 2026 16:46
@pcastonguay pcastonguay requested a review from hchings March 16, 2026 16:46
@pcastonguay
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

The default value of kv_transfer_timeout_ms in CacheTransceiverConfig is changed from None to 60000 milliseconds, establishing an explicit 60-second default timeout for KV cache transfers when the parameter is not specified.

Changes

Cohort / File(s) Summary
KV Cache Transfer Timeout Default
tensorrt_llm/llmapi/llm_args.py
Modified default value of kv_transfer_timeout_ms field in CacheTransceiverConfig class from None to 60000 (milliseconds), changing implicit timeout behavior for KV cache transfer operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty except for the template. No actual description, test coverage, or justification for the breaking change is provided. Fill in the Description section explaining why the default timeout was changed to 60 seconds and what impact this breaking change has. Document any test coverage validating this change.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly and specifically describes the main change: setting a default KV cache transfer timeout to 60 seconds, which is the only modification in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai title placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

@pcastonguay pcastonguay changed the title [None][chore] BREAKING: Setting default value of KV cache transfer timeout to 60s [https://nvbugs/5969206][fix] BREAKING: Setting default value of KV cache transfer timeout to 60s Mar 16, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39111 [ run ] triggered by Bot. Commit: f03a210 Link to invocation

Copy link
Copy Markdown
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39111 [ run ] completed with state FAILURE. Commit: f03a210
/LLM/main/L0_MergeRequest_PR pipeline #30370 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Tabrizian
Copy link
Copy Markdown
Member

/bot run --disable-fail-fast

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
@Tabrizian Tabrizian force-pushed the default_kv_transfer_timeout branch from f03a210 to 94efdf6 Compare March 17, 2026 06:57
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39209 [ run ] triggered by Bot. Commit: 94efdf6 Link to invocation

@pcastonguay
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39252 [ run ] triggered by Bot. Commit: 2d641b2 Link to invocation

@pcastonguay
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39270 [ run ] triggered by Bot. Commit: 30960da Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39270 [ run ] completed with state SUCCESS. Commit: 30960da
/LLM/main/L0_MergeRequest_PR pipeline #30529 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@pcastonguay
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39465 [ run ] triggered by Bot. Commit: 30960da Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39465 [ run ] completed with state SUCCESS. Commit: 30960da
/LLM/main/L0_MergeRequest_PR pipeline #30691 completed with status: 'SUCCESS'

CI Report

Link to invocation

@pcastonguay pcastonguay merged commit bd14845 into NVIDIA:main Mar 18, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
…ache transfer timeout to 60s (NVIDIA#12249)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
…ache transfer timeout to 60s (NVIDIA#12249)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
yifjiang added a commit to yifjiang/TensorRT-LLM that referenced this pull request Apr 15, 2026
… side

checkContextTransferStatus retries stuck prefill-side KV cache transfers
indefinitely using only the per-iteration kv_transfer_sender_future_timeout_ms.
The per-request total timeout kv_transfer_timeout_ms is plumbed through config
(CacheTransceiverConfig::getKvTransferTimeoutMs) but never read in
batch_manager code — it is dead code.

Under concurrent load with constrained cache, stuck transfers hold KV blocks
forever, exhausting the pool. The prefill worker becomes permanently
unresponsive while health probes continue returning 200 OK.

Fix: After each per-iteration timeout in checkContextTransferStatus, check
total elapsed time (via LlmRequest::getKvCacheTransferStart, already set by
sendAsync) against kv_transfer_timeout_ms. When exceeded, mark the request
as DISAGG_TRANS_ERROR, best-effort cancel via CacheSender, and remove from
mSenderFutures so blocks can be freed.

Reproducer: 1P1D disagg with Qwen3-0.6B, free_gpu_memory_fraction=0.2,
NIXL over TCP, concurrency 16 with ISL 8000. Server hangs after ~2 minutes
and never recovers.

Related: NVIDIA#12249 (set default kv_transfer_timeout_ms=60s — config only)
Related: NVIDIA#12313, NVIDIA#12314 (Python-level fixes — cannot fire due to race
condition with C++ transfer completion removing requests from Python
tracking before the 60s timeout elapses)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yifjiang added a commit to yifjiang/TensorRT-LLM that referenced this pull request Apr 15, 2026
… side

checkContextTransferStatus retries stuck prefill-side KV cache transfers
indefinitely using only the per-iteration kv_transfer_sender_future_timeout_ms.
The per-request total timeout kv_transfer_timeout_ms is plumbed through config
(CacheTransceiverConfig::getKvTransferTimeoutMs) but never read in
batch_manager code — it is dead code.

Under concurrent load with constrained cache, stuck transfers hold KV blocks
forever, exhausting the pool. The prefill worker becomes permanently
unresponsive while health probes continue returning 200 OK.

Fix: After each per-iteration timeout in checkContextTransferStatus, check
total elapsed time (via LlmRequest::getKvCacheTransferStart, already set by
sendAsync) against kv_transfer_timeout_ms. When exceeded, mark the request
as DISAGG_TRANS_ERROR, best-effort cancel via CacheSender, and remove from
mSenderFutures so blocks can be freed.

Reproducer: 1P1D disagg with Qwen3-0.6B, free_gpu_memory_fraction=0.2,
NIXL over TCP, concurrency 16 with ISL 8000. Server hangs after ~2 minutes
and never recovers.

Related: NVIDIA#12249 (set default kv_transfer_timeout_ms=60s — config only)
Related: NVIDIA#12313, NVIDIA#12314 (Python-level fixes — cannot fire due to race
condition with C++ transfer completion removing requests from Python
tracking before the 60s timeout elapses)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.