Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[None][fix] Fix OOM issue/dummy request allocation/chunked prefill/pp for KV Cache Manager V2#11710

Merged
yizhang-nv merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
yizhang-nv:minor-fix-for-kv-cache-manager-v2yizhang-nv/TensorRT-LLM:minor-fix-for-kv-cache-manager-v2Copy head branch name to clipboard
Mar 3, 2026
Merged

[None][fix] Fix OOM issue/dummy request allocation/chunked prefill/pp for KV Cache Manager V2#11710
yizhang-nv merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
yizhang-nv:minor-fix-for-kv-cache-manager-v2yizhang-nv/TensorRT-LLM:minor-fix-for-kv-cache-manager-v2Copy head branch name to clipboard

Conversation

@yizhang-nv
Copy link
Copy Markdown
Member

@yizhang-nv yizhang-nv commented Feb 25, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved resource cleanup and error handling in KV cache management to prevent resource leaks during error conditions.
    • Enhanced shutdown behavior for more reliable resource deallocation.
  • Improvements

    • Refined KV cache statistics exposure for better resource monitoring.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@yizhang-nv yizhang-nv requested a review from a team as a code owner February 25, 2026 08:50
@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36767 [ run ] triggered by Bot. Commit: 9ccf1b3 Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

The changes make the KVCacheStats.allocated_bytes property writable from Python and refactor resource management in KVCacheManager and KVCacheManagerV2, switching to a unified KvCacheStats return type, adding resource cleanup helpers, and adjusting shutdown and resource quota logic.

Changes

Cohort / File(s) Summary
C++ Property Binding
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
Changed allocated_bytes property from read-only (def_ro) to read-write (def_rw), enabling Python to write to this KvCacheStats field.
Python Resource Manager Refactoring
tensorrt_llm/_torch/pyexecutor/resource_manager.py
Imported KvCacheStats and updated both KVCacheManager and KVCacheManagerV2 to return unified KvCacheStats objects from get_kv_cache_stats(). Added release_resources() helper for error-path cleanup. Changed shutdown method calls and get_max_resource_count() return value from 0 to 1. Narrowed stop_committing() condition in update_resources().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is entirely empty, containing only the template structure without any actual content in the Description or Test Coverage sections. Add a clear description explaining what was fixed (OOM issues, dummy request allocation, chunked prefill), why these changes were needed, and list the relevant tests that validate these fixes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title addresses specific issues (OOM, dummy request allocation, chunked prefill) and targets KV Cache Manager V2, aligning with the actual changes across kvCacheManager.cpp and resource_manager.py files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tensorrt_llm/_torch/pyexecutor/resource_manager.py (3)

2079-2091: ⚠️ Potential issue | 🟠 Major

Resource leak on resize() failure in is_gen block — inconsistent with other error paths.

When kv_cache.resize(new_capacity) fails (Line 2081) or draft_kv_cache.resize(new_capacity) fails (Line 2087), a ValueError is raised. Unlike every other failure path in this function, release_resources is not called before raising, leaving the already-allocated kv_cache (and draft_kv_cache, if applicable) in kv_cache_map without being freed.

🐛 Proposed fix
             if prepare_resource:
                 new_capacity = kv_cache.capacity + max_num_draft_tokens + 1
                 success = kv_cache.resize(new_capacity)
                 if not success:
-                    raise ValueError(
-                        f"Failed to resize capacity of KV cache for request {req.py_request_id} to {new_capacity} tokens for dummy request"
-                    )
+                    release_resources(req)
+                    return None
                 if draft_kv_cache is not None:
                     success = draft_kv_cache.resize(new_capacity)
                     if not success:
-                        raise ValueError(
-                            f"Failed to resize capacity of draft KV cache for request {req.py_request_id} to {new_capacity} tokens for dummy request"
-                        )
+                        release_resources(req, free_draft_resources=True)
+                        return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 2079 - 2091,
In the is_gen branch where prepare_resource is true, ensure you free
already-allocated resources before raising on resize failure: when
kv_cache.resize(new_capacity) or draft_kv_cache.resize(new_capacity) returns
False, call release_resources(req.py_request_id) (or the existing
release_resources function that cleans up entries in kv_cache_map for this
request) to remove and free kv_cache and draft_kv_cache, then raise the
ValueError; update the error paths in the block that contains prepare_resource,
kv_cache.resize, draft_kv_cache.resize and the existing raise statements so they
always call release_resources(req.py_request_id) prior to raising.

2264-2266: ⚠️ Potential issue | 🟡 Minor

get_max_resource_count() returning 1 is still a placeholder and may cause scheduling issues.

Returning 1 prevents division-by-zero but any scheduler that uses this to compute maximum capacity or admission thresholds will severely underestimate available resources for KVCacheManagerV2, potentially admitting far fewer requests than the hardware supports. The V1 equivalent returns self.impl.max_num_blocks.

Consider wiring through the actual page count, e.g.:

def get_max_resource_count(self) -> int:
    return max(
        self.impl.get_page_index_upper_bound(layer_id, Role.KEY)
        for layer_id in typed_range(LayerId(self.num_local_layers))
    ) // self.kv_factor
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 2264 - 2266,
get_max_resource_count currently returns a placeholder 1 which underestimates
capacity; replace it to compute the real max resource count by iterating local
layers (use typed_range(LayerId(self.num_local_layers))) and calling
self.impl.get_page_index_upper_bound(layer_id, Role.KEY) for each, take the
maximum of those values and divide by self.kv_factor (use integer division if
intended) to mirror the V1 behavior (self.impl.max_num_blocks) so schedulers get
an accurate capacity for KVCacheManagerV2.

2046-2067: ⚠️ Potential issue | 🔴 Critical

Replace ValueError with release_resources() pattern to prevent resource leak on resize failures.

At lines 2082–2091, when kv_cache.resize() or draft_kv_cache.resize() fails in the is_gen block, the code raises ValueError directly without freeing allocated resources. Unlike the error paths at lines 2051–2072 which correctly call release_resources(), these failures leak both the primary and draft KV cache objects. Use release_resources() followed by return None instead of raising.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py` around lines 2046 - 2067,
In the is_gen branch, replace the direct raising of ValueError when
kv_cache.resize(...) or draft_kv_cache.resize(...) fails with the same cleanup
pattern used elsewhere: call release_resources(req) (and release_resources(req,
free_draft_resources=True) where the draft cache was allocated) and then return
None; update the failure paths around kw functions kv_cache.resize and
draft_kv_cache.resize so both primary and draft KV cache objects are freed via
release_resources() instead of raising an exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py`:
- Line 19: Remove the direct class import "KvCacheStats" and instead reference
it through the module namespace
(tensorrt_llm.bindings.internal.batch_manager.KvCacheStats) wherever it's used;
follow the existing alias/import pattern used for KVCacheManagerCpp and other
bindings (reuse the module-level access rather than adding a new class-level
import) so all usages call the class as
tensorrt_llm.bindings.internal.batch_manager.KvCacheStats().
- Around line 1971-1975: KVCacheManagerV2.get_kv_cache_stats() currently only
sets allocated_bytes, causing callers (py_executor.py and scheduler.py) to read
zeros/empty dicts; update KVCacheManagerV2.get_kv_cache_stats to fully populate
the KvCacheStats object (set max_num_blocks, free_num_blocks, used_num_blocks,
tokens_per_block, alloc_total_blocks, alloc_new_blocks, reused_blocks,
missed_blocks, cache_hit_rate and num_free_blocks_per_window_size) by querying
the underlying implementation (self.impl) or the manager's internal
counters/metrics (e.g., use existing impl methods like get_quota, get_usage,
get_metrics or maintain/update counters on allocation/free events) so the
returned KvCacheStats contains correct values for all fields expected by
py_executor.py and scheduler.py.

---

Outside diff comments:
In `@tensorrt_llm/_torch/pyexecutor/resource_manager.py`:
- Around line 2079-2091: In the is_gen branch where prepare_resource is true,
ensure you free already-allocated resources before raising on resize failure:
when kv_cache.resize(new_capacity) or draft_kv_cache.resize(new_capacity)
returns False, call release_resources(req.py_request_id) (or the existing
release_resources function that cleans up entries in kv_cache_map for this
request) to remove and free kv_cache and draft_kv_cache, then raise the
ValueError; update the error paths in the block that contains prepare_resource,
kv_cache.resize, draft_kv_cache.resize and the existing raise statements so they
always call release_resources(req.py_request_id) prior to raising.
- Around line 2264-2266: get_max_resource_count currently returns a placeholder
1 which underestimates capacity; replace it to compute the real max resource
count by iterating local layers (use
typed_range(LayerId(self.num_local_layers))) and calling
self.impl.get_page_index_upper_bound(layer_id, Role.KEY) for each, take the
maximum of those values and divide by self.kv_factor (use integer division if
intended) to mirror the V1 behavior (self.impl.max_num_blocks) so schedulers get
an accurate capacity for KVCacheManagerV2.
- Around line 2046-2067: In the is_gen branch, replace the direct raising of
ValueError when kv_cache.resize(...) or draft_kv_cache.resize(...) fails with
the same cleanup pattern used elsewhere: call release_resources(req) (and
release_resources(req, free_draft_resources=True) where the draft cache was
allocated) and then return None; update the failure paths around kw functions
kv_cache.resize and draft_kv_cache.resize so both primary and draft KV cache
objects are freed via release_resources() instead of raising an exception.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc63612 and 9ccf1b3.

📒 Files selected for processing (2)
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
  • tensorrt_llm/_torch/pyexecutor/resource_manager.py

Comment thread tensorrt_llm/_torch/pyexecutor/resource_manager.py
Comment thread tensorrt_llm/_torch/pyexecutor/resource_manager.py
Copy link
Copy Markdown
Collaborator

@jiaganc jiaganc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yizhang-nv yizhang-nv requested a review from a team as a code owner February 25, 2026 15:22
@yizhang-nv yizhang-nv requested a review from lfr-0531 February 25, 2026 15:22
@yizhang-nv yizhang-nv force-pushed the minor-fix-for-kv-cache-manager-v2 branch from 1f96043 to 461552e Compare February 25, 2026 15:23
@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36810 [ run ] triggered by Bot. Commit: dbba087 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36810 [ run ] completed with state SUCCESS. Commit: dbba087
/LLM/main/L0_MergeRequest_PR pipeline #28502 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Copy Markdown
Collaborator

@lfr-0531 lfr-0531 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36932 [ run ] triggered by Bot. Commit: db32d7b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36932 [ run ] completed with state SUCCESS. Commit: db32d7b
/LLM/main/L0_MergeRequest_PR pipeline #28596 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yizhang-nv yizhang-nv requested a review from a team as a code owner February 27, 2026 03:39
@yizhang-nv yizhang-nv force-pushed the minor-fix-for-kv-cache-manager-v2 branch from 8917c62 to ad9335f Compare February 27, 2026 03:43
@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37016 [ run ] triggered by Bot. Commit: ad9335f Link to invocation

@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37037 [ run ] triggered by Bot. Commit: 9b6b6a8 Link to invocation

@yizhang-nv yizhang-nv changed the title [None][fix] Fix OOM issue/dummy request allocation/chunked prefill for KV Cache Manager V2 [None][fix] Fix OOM issue/dummy request allocation/chunked prefill/pp for KV Cache Manager V2 Feb 27, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37037 [ run ] completed with state SUCCESS. Commit: 9b6b6a8
/LLM/main/L0_MergeRequest_PR pipeline #28677 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37147 [ run ] triggered by Bot. Commit: 9b6b6a8 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37147 [ run ] completed with state SUCCESS. Commit: 9b6b6a8
/LLM/main/L0_MergeRequest_PR pipeline #28759 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yizhang-nv yizhang-nv force-pushed the minor-fix-for-kv-cache-manager-v2 branch 2 times, most recently from 444b782 to 455b8b4 Compare March 2, 2026 02:28
@nvpohanh
Copy link
Copy Markdown
Collaborator

nvpohanh commented Mar 2, 2026

@lowsfer could you review this? thanks!

@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37268 [ run ] triggered by Bot. Commit: c1f1e35 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37268 [ run ] completed with state SUCCESS. Commit: c1f1e35
/LLM/main/L0_MergeRequest_PR pipeline #28842 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
@yizhang-nv yizhang-nv force-pushed the minor-fix-for-kv-cache-manager-v2 branch from c1f1e35 to 4f1bd5e Compare March 2, 2026 12:26
@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37334 [ run ] triggered by Bot. Commit: 4f1bd5e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37334 [ run ] completed with state SUCCESS. Commit: 4f1bd5e
/LLM/main/L0_MergeRequest_PR pipeline #28895 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yizhang-nv
Copy link
Copy Markdown
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37409 [ run ] triggered by Bot. Commit: 4f1bd5e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #37409 [ run ] completed with state SUCCESS. Commit: 4f1bd5e
/LLM/main/L0_MergeRequest_PR pipeline #28958 completed with status: 'SUCCESS'

Link to invocation

Copy link
Copy Markdown
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yizhang-nv yizhang-nv merged commit fd1738a into NVIDIA:main Mar 3, 2026
4 of 5 checks passed
@yizhang-nv yizhang-nv deleted the minor-fix-for-kv-cache-manager-v2 branch March 3, 2026 09:26
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Mar 9, 2026
… for KV Cache Manager V2 (NVIDIA#11710)

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
tianyuz-nv pushed a commit to wanqian-nv/TensorRT-LLM that referenced this pull request Mar 19, 2026
… for KV Cache Manager V2 (NVIDIA#11710)

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.