Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[None][fix] Add more models to increase perf test coverage#12184

Merged
chenfeiz0326 merged 4 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
chenfeiz0326:chenfeiz/increase-perf-test-coveragechenfeiz0326/TensorRT-LLM:chenfeiz/increase-perf-test-coverageCopy head branch name to clipboard
Mar 17, 2026
Merged

[None][fix] Add more models to increase perf test coverage#12184
chenfeiz0326 merged 4 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
chenfeiz0326:chenfeiz/increase-perf-test-coveragechenfeiz0326/TensorRT-LLM:chenfeiz/increase-perf-test-coverageCopy head branch name to clipboard

Conversation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

@chenfeiz0326 chenfeiz0326 commented Mar 13, 2026

Summary by CodeRabbit

  • New Features

    • Added multi-node test configurations for 5 and 6-node GPU clusters.
    • Added support for Llama 3.3 70B Instruct FP4 model in performance testing.
    • Added DeepSeek R1 FP4 disaggregated performance configurations with multiple backend options.
  • Tests

    • New performance sanity test suite entries for multi-GPU and Blackwell architectures.
  • Documentation

    • Updated performance test configuration guidelines and naming conventions for improved clarity.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-1,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-2,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-4"

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

The changes add new performance sanity test configurations for multi-node GB200 systems and Llama 3.3 FP4 models. Updates include Jenkins test job definitions, model path mappings, test lists, and new YAML configuration files for both aggregated and disaggregated benchmark scenarios.

Changes

Cohort / File(s) Summary
Jenkins Test Configuration
jenkins/L0_Test.groovy
Appends two new multi-node SBSA test configurations for 5 and 6 nodes to launchTestJobs flow, specifying node counts, GPU allocations, and test lists.
Test Definitions and Configurations
tests/integration/defs/perf/test_perf_sanity.py, tests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.yml, tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx2_node1_gpu4_gen1_node4_gpu16.yml
Adds Llama 3.3 FP4 model path mapping to MODEL_PATH_DICT; introduces two new test entries for B200 multi-GPU performance sanity; creates new 6-node performance sanity test configuration with six test cases.
Documentation
tests/scripts/perf-sanity/README.md
Clarifies GPU-target suffix naming conventions and multi-node filename adjustments; documents server_config rules requiring exactly one client_config per server_config; expands use cases to include multi-node scenarios.
Aggregated Performance Configuration
tests/scripts/perf-sanity/aggregated/llama_v3_3_70b_instruct_fp4_blackwell.yaml
Defines Llama 3.3 70B FP4 aggregated test configuration with two server configs specifying tensor parallelism, batch sizes, token limits, CUDA graph, and KV cache settings.
Disaggregated Performance Configurations
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml, ...eplb0_mtp3_ccb-UCX.yaml, ...eplb288_mtp3_ccb-UCX.yaml
Adds three new disaggregated DeepSeek R1 FP4 benchmark configurations with distinct load-balancer and UCX backend parameters; specifies worker configurations for generator and context stages including tensor/expert parallelism, cache settings, and speculative decoding parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and provides only the template without substantive content in key sections like Description and Test Coverage. Add a detailed Description section explaining what models were added and why. Include a Test Coverage section listing the specific tests that validate these changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][fix] Add more models to increase perf test coverage' clearly describes the main change: adding models to expand performance test coverage.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml (1)

14-14: Use a variant-specific Slurm job name.

All three new configs keep job_name: unified-benchmark, which makes queued jobs and archived logs harder to tie back to a specific model/backend when perf-sanity runs fan out. A short model/backend suffix would make triage much easier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml`
at line 14, The job_name field currently uses the generic value
"unified-benchmark"; update the job_name YAML key in each new config to include
a short variant-specific suffix (e.g., model and backend or a concise
model-backend tag) so queued jobs and archived logs can be tied to the exact
variant—locate the job_name entry in the new config (the "job_name" YAML key)
and replace the generic value with a descriptive variant-specific name for that
file's model/backend.
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml (1)

68-70: Generate backend permutations from one source.

Lines 68-70 and 95-97 are the only substantive delta from the UCX sibling. Keeping a full copy for each cache backend makes future tuning drift very likely across these perf-sanity variants.

Also applies to: 95-97

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml`
around lines 68 - 70, The two YAML variants differ only by the
cache_transceiver_config.backend value (NIXL vs UCX); replace the duplicated
file copies by generating backend permutations from a single source—e.g., in
this file change the single backend scalar under cache_transceiver_config to
either a list (backends: [NIXL, UCX]) or use a YAML anchor/alias/template
placeholder and update the test generator to iterate over backends, ensuring the
generator or test harness populates cache_transceiver_config.backend for each
permutation rather than maintaining separate files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml`:
- Around line 68-70: The two YAML variants differ only by the
cache_transceiver_config.backend value (NIXL vs UCX); replace the duplicated
file copies by generating backend permutations from a single source—e.g., in
this file change the single backend scalar under cache_transceiver_config to
either a list (backends: [NIXL, UCX]) or use a YAML anchor/alias/template
placeholder and update the test generator to iterate over backends, ensuring the
generator or test harness populates cache_transceiver_config.backend for each
permutation rather than maintaining separate files.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml`:
- Line 14: The job_name field currently uses the generic value
"unified-benchmark"; update the job_name YAML key in each new config to include
a short variant-specific suffix (e.g., model and backend or a concise
model-backend tag) so queued jobs and archived logs can be tied to the exact
variant—locate the job_name entry in the new config (the "job_name" YAML key)
and replace the generic value with a descriptive variant-specific name for that
file's model/backend.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae971ddb-3d1b-4a40-b04a-95f3002650c8

📥 Commits

Reviewing files that changed from the base of the PR and between 0fc0cbd and 7423814.

📒 Files selected for processing (9)
  • jenkins/L0_Test.groovy
  • tests/integration/defs/perf/test_perf_sanity.py
  • tests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx2_node1_gpu4_gen1_node4_gpu16.yml
  • tests/scripts/perf-sanity/README.md
  • tests/scripts/perf-sanity/aggregated/llama_v3_3_70b_instruct_fp4_blackwell.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb288_mtp3_ccb-UCX.yaml

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38829 [ run ] triggered by Bot. Commit: 7423814 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38829 [ run ] completed with state SUCCESS. Commit: 7423814
/LLM/main/L0_MergeRequest_PR pipeline #30140 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38982 [ run ] triggered by Bot. Commit: 7423814 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38982 [ run ] completed with state SUCCESS. Commit: 7423814
/LLM/main/L0_MergeRequest_PR pipeline #30264 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Chenfei Zhang added 2 commits March 15, 2026 23:18
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-5"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39042 [ run ] triggered by Bot. Commit: cf5af63 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39042 [ run ] completed with state SUCCESS. Commit: cf5af63
/LLM/main/L0_MergeRequest_PR pipeline #30313 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4"

1 similar comment
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39090 [ run ] triggered by Bot. Commit: b0f74a9 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39090 [ run ] completed with state SUCCESS. Commit: b0f74a9
/LLM/main/L0_MergeRequest_PR pipeline #30352 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

chenfeiz0326 commented Mar 17, 2026

/bot skip --comment "Only add new perf tests, no need to run the whole CI pipeline"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39165 [ skip ] triggered by Bot. Commit: b0f74a9 Link to invocation

@chenfeiz0326 chenfeiz0326 enabled auto-merge (squash) March 17, 2026 03:11
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39165 [ skip ] completed with state SUCCESS. Commit: b0f74a9
Skipping testing for commit b0f74a9

Link to invocation

@chenfeiz0326 chenfeiz0326 merged commit 7dae8ff into NVIDIA:main Mar 17, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
)

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>
Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
)

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster>
Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.