[None][fix] Add more models to increase perf test coverage by chenfeiz0326 · Pull Request #12184 · NVIDIA/TensorRT-LLM

chenfeiz0326 · Mar 13, 2026

Summary by CodeRabbit

New Features
- Added multi-node test configurations for 5 and 6-node GPU clusters.
- Added support for Llama 3.3 70B Instruct FP4 model in performance testing.
- Added DeepSeek R1 FP4 disaggregated performance configurations with multiple backend options.
Tests
- New performance sanity test suite entries for multi-GPU and Blackwell architectures.
Documentation
- Updated performance test configuration guidelines and naming conventions for improved clarity.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>

chenfeiz0326 · Mar 13, 2026

/bot run --disable-fail-fast --stage-list "GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-1,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-2,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-4"

coderabbitai · Mar 13, 2026

📝 Walkthrough

Walkthrough

The changes add new performance sanity test configurations for multi-node GB200 systems and Llama 3.3 FP4 models. Updates include Jenkins test job definitions, model path mappings, test lists, and new YAML configuration files for both aggregated and disaggregated benchmark scenarios.

Changes

Cohort / File(s)	Summary
Jenkins Test Configuration `jenkins/L0_Test.groovy`	Appends two new multi-node SBSA test configurations for 5 and 6 nodes to launchTestJobs flow, specifying node counts, GPU allocations, and test lists.
Test Definitions and Configurations `tests/integration/defs/perf/test_perf_sanity.py`, `tests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.yml`, `tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx2_node1_gpu4_gen1_node4_gpu16.yml`	Adds Llama 3.3 FP4 model path mapping to MODEL_PATH_DICT; introduces two new test entries for B200 multi-GPU performance sanity; creates new 6-node performance sanity test configuration with six test cases.
Documentation `tests/scripts/perf-sanity/README.md`	Clarifies GPU-target suffix naming conventions and multi-node filename adjustments; documents server_config rules requiring exactly one client_config per server_config; expands use cases to include multi-node scenarios.
Aggregated Performance Configuration `tests/scripts/perf-sanity/aggregated/llama_v3_3_70b_instruct_fp4_blackwell.yaml`	Defines Llama 3.3 70B FP4 aggregated test configuration with two server configs specifying tensor parallelism, batch sizes, token limits, CUDA graph, and KV cache settings.
Disaggregated Performance Configurations `tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml`, `...eplb0_mtp3_ccb-UCX.yaml`, `...eplb288_mtp3_ccb-UCX.yaml`	Adds three new disaggregated DeepSeek R1 FP4 benchmark configurations with distinct load-balancer and UCX backend parameters; specifies worker configurations for generator and context stages including tensor/expert parallelism, cache settings, and speculative decoding parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete and provides only the template without substantive content in key sections like Description and Test Coverage.	Add a detailed Description section explaining what models were added and why. Include a Test Coverage section listing the specific tests that validate these changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][fix] Add more models to increase perf test coverage' clearly describes the main change: adding models to expand performance test coverage.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml (1)
14-14: Use a variant-specific Slurm job name.

All three new configs keep job_name: unified-benchmark, which makes queued jobs and archived logs harder to tie back to a specific model/backend when perf-sanity runs fan out. A short model/backend suffix would make triage much easier.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml`
at line 14, The job_name field currently uses the generic value
"unified-benchmark"; update the job_name YAML key in each new config to include
a short variant-specific suffix (e.g., model and backend or a concise
model-backend tag) so queued jobs and archived logs can be tied to the exact
variant—locate the job_name entry in the new config (the "job_name" YAML key)
and replace the generic value with a descriptive variant-specific name for that
file's model/backend.
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml (1)
68-70: Generate backend permutations from one source.

Lines 68-70 and 95-97 are the only substantive delta from the UCX sibling. Keeping a full copy for each cache backend makes future tuning drift very likely across these perf-sanity variants.

Also applies to: 95-97
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml`
around lines 68 - 70, The two YAML variants differ only by the
cache_transceiver_config.backend value (NIXL vs UCX); replace the duplicated
file copies by generating backend permutations from a single source—e.g., in
this file change the single backend scalar under cache_transceiver_config to
either a list (backends: [NIXL, UCX]) or use a YAML anchor/alias/template
placeholder and update the test generator to iterate over backends, ensuring the
generator or test harness populates cache_transceiver_config.backend for each
permutation rather than maintaining separate files.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml`:
- Around line 68-70: The two YAML variants differ only by the
cache_transceiver_config.backend value (NIXL vs UCX); replace the duplicated
file copies by generating backend permutations from a single source—e.g., in
this file change the single backend scalar under cache_transceiver_config to
either a list (backends: [NIXL, UCX]) or use a YAML anchor/alias/template
placeholder and update the test generator to iterate over backends, ensuring the
generator or test harness populates cache_transceiver_config.backend for each
permutation rather than maintaining separate files.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml`:
- Line 14: The job_name field currently uses the generic value
"unified-benchmark"; update the job_name YAML key in each new config to include
a short variant-specific suffix (e.g., model and backend or a concise
model-backend tag) so queued jobs and archived logs can be tied to the exact
variant—locate the job_name entry in the new config (the "job_name" YAML key)
and replace the generic value with a descriptive variant-specific name for that
file's model/backend.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae971ddb-3d1b-4a40-b04a-95f3002650c8

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc0cbd and 7423814.

📒 Files selected for processing (9)

jenkins/L0_Test.groovy
tests/integration/defs/perf/test_perf_sanity.py
tests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.yml
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx2_node1_gpu4_gen1_node4_gpu16.yml
tests/scripts/perf-sanity/README.md
tests/scripts/perf-sanity/aggregated/llama_v3_3_70b_instruct_fp4_blackwell.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb288_mtp3_ccb-UCX.yaml

tensorrt-cicd · Mar 13, 2026

PR_Github #38829 [ run ] triggered by Bot. Commit: 7423814 Link to invocation

tensorrt-cicd · Mar 13, 2026

PR_Github #38829 [ run ] completed with state SUCCESS. Commit: 7423814
/LLM/main/L0_MergeRequest_PR pipeline #30140 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

chenfeiz0326 · Mar 15, 2026

/bot run --disable-fail-fast --stage-list "GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2"

tensorrt-cicd · Mar 15, 2026

PR_Github #38982 [ run ] triggered by Bot. Commit: 7423814 Link to invocation

tensorrt-cicd · Mar 15, 2026

PR_Github #38982 [ run ] completed with state SUCCESS. Commit: 7423814
/LLM/main/L0_MergeRequest_PR pipeline #30264 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>

chenfeiz0326 · Mar 16, 2026

/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-5"

tensorrt-cicd · Mar 16, 2026

PR_Github #39042 [ run ] triggered by Bot. Commit: cf5af63 Link to invocation

tensorrt-cicd · Mar 16, 2026

PR_Github #39042 [ run ] completed with state SUCCESS. Commit: cf5af63
/LLM/main/L0_MergeRequest_PR pipeline #30313 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>

chenfeiz0326 · Mar 16, 2026

/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4"

chenfeiz0326 · Mar 16, 2026

/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4"

tensorrt-cicd · Mar 16, 2026

PR_Github #39090 [ run ] triggered by Bot. Commit: b0f74a9 Link to invocation

tensorrt-cicd · Mar 16, 2026

PR_Github #39090 [ run ] completed with state SUCCESS. Commit: b0f74a9
/LLM/main/L0_MergeRequest_PR pipeline #30352 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

chenfeiz0326 · Mar 17, 2026

/bot skip --comment "Only add new perf tests, no need to run the whole CI pipeline"

tensorrt-cicd · Mar 17, 2026

PR_Github #39165 [ skip ] triggered by Bot. Commit: b0f74a9 Link to invocation

tensorrt-cicd · Mar 17, 2026

PR_Github #39165 [ skip ] completed with state SUCCESS. Commit: b0f74a9
Skipping testing for commit b0f74a9

Link to invocation

) Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster> Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>

update

7423814

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>

chenfeiz0326 requested review from a team as code owners March 13, 2026 05:01

chenfeiz0326 requested review from mlefeb01 and yuanjingx87 March 13, 2026 05:01

github-actions Bot assigned chenfeiz0326 Mar 13, 2026

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

ruodil approved these changes Mar 13, 2026

View reviewed changes

Chenfei Zhang added 2 commits March 15, 2026 23:18

update

bb7d37b

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>

update

cf5af63

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>

first commit

b0f74a9

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>

chenfeiz0326 requested a review from ZhanruiSunCh March 17, 2026 02:36

ZhanruiSunCh approved these changes Mar 17, 2026

View reviewed changes

chenfeiz0326 enabled auto-merge (squash) March 17, 2026 03:11

chenfeiz0326 merged commit 7dae8ff into NVIDIA:main Mar 17, 2026
5 checks passed

Search code, repositories, users, issues, pull requests...

Conversation

chenfeiz0326 commented Mar 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

chenfeiz0326 commented Mar 13, 2026

Uh oh!

coderabbitai Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Mar 13, 2026

Uh oh!

tensorrt-cicd commented Mar 13, 2026

Uh oh!

chenfeiz0326 commented Mar 15, 2026

Uh oh!

tensorrt-cicd commented Mar 15, 2026

Uh oh!

tensorrt-cicd commented Mar 15, 2026

Uh oh!

chenfeiz0326 commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

chenfeiz0326 commented Mar 16, 2026

Uh oh!

chenfeiz0326 commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

chenfeiz0326 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chenfeiz0326 commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

chenfeiz0326 commented Mar 17, 2026 •

edited

Loading