[None][fix] Add more models to increase perf test coverage#12184
[None][fix] Add more models to increase perf test coverage#12184chenfeiz0326 merged 4 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom chenfeiz0326:chenfeiz/increase-perf-test-coveragechenfeiz0326/TensorRT-LLM:chenfeiz/increase-perf-test-coverageCopy head branch name to clipboard
Conversation
|
/bot run --disable-fail-fast --stage-list "GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-1,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-2,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-4" |
📝 WalkthroughWalkthroughThe changes add new performance sanity test configurations for multi-node GB200 systems and Llama 3.3 FP4 models. Updates include Jenkins test job definitions, model path mappings, test lists, and new YAML configuration files for both aggregated and disaggregated benchmark scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml (1)
14-14: Use a variant-specific Slurm job name.All three new configs keep
job_name: unified-benchmark, which makes queued jobs and archived logs harder to tie back to a specific model/backend when perf-sanity runs fan out. A short model/backend suffix would make triage much easier.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml` at line 14, The job_name field currently uses the generic value "unified-benchmark"; update the job_name YAML key in each new config to include a short variant-specific suffix (e.g., model and backend or a concise model-backend tag) so queued jobs and archived logs can be tied to the exact variant—locate the job_name entry in the new config (the "job_name" YAML key) and replace the generic value with a descriptive variant-specific name for that file's model/backend.tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml (1)
68-70: Generate backend permutations from one source.Lines 68-70 and 95-97 are the only substantive delta from the UCX sibling. Keeping a full copy for each cache backend makes future tuning drift very likely across these perf-sanity variants.
Also applies to: 95-97
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml` around lines 68 - 70, The two YAML variants differ only by the cache_transceiver_config.backend value (NIXL vs UCX); replace the duplicated file copies by generating backend permutations from a single source—e.g., in this file change the single backend scalar under cache_transceiver_config to either a list (backends: [NIXL, UCX]) or use a YAML anchor/alias/template placeholder and update the test generator to iterate over backends, ensuring the generator or test harness populates cache_transceiver_config.backend for each permutation rather than maintaining separate files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yaml`:
- Around line 68-70: The two YAML variants differ only by the
cache_transceiver_config.backend value (NIXL vs UCX); replace the duplicated
file copies by generating backend permutations from a single source—e.g., in
this file change the single backend scalar under cache_transceiver_config to
either a list (backends: [NIXL, UCX]) or use a YAML anchor/alias/template
placeholder and update the test generator to iterate over backends, ensuring the
generator or test harness populates cache_transceiver_config.backend for each
permutation rather than maintaining separate files.
In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yaml`:
- Line 14: The job_name field currently uses the generic value
"unified-benchmark"; update the job_name YAML key in each new config to include
a short variant-specific suffix (e.g., model and backend or a concise
model-backend tag) so queued jobs and archived logs can be tied to the exact
variant—locate the job_name entry in the new config (the "job_name" YAML key)
and replace the generic value with a descriptive variant-specific name for that
file's model/backend.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ae971ddb-3d1b-4a40-b04a-95f3002650c8
📒 Files selected for processing (9)
jenkins/L0_Test.groovytests/integration/defs/perf/test_perf_sanity.pytests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx2_node1_gpu4_gen1_node4_gpu16.ymltests/scripts/perf-sanity/README.mdtests/scripts/perf-sanity/aggregated/llama_v3_3_70b_instruct_fp4_blackwell.yamltests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb0_mtp3_ccb-UCX.yamltests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con2048_ctx2_dep4_gen1_dep16_eplb288_mtp3_ccb-UCX.yaml
|
PR_Github #38829 [ run ] triggered by Bot. Commit: |
|
PR_Github #38829 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-20_GPUs-5_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2" |
|
PR_Github #38982 [ run ] triggered by Bot. Commit: |
|
PR_Github #38982 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-5" |
|
PR_Github #39042 [ run ] triggered by Bot. Commit: |
|
PR_Github #39042 [ run ] completed with state
|
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
|
/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4" |
1 similar comment
|
/bot run --disable-fail-fast --stage-list "GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-2,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-3,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-4" |
|
PR_Github #39090 [ run ] triggered by Bot. Commit: |
|
PR_Github #39090 [ run ] completed with state |
|
/bot skip --comment "Only add new perf tests, no need to run the whole CI pipeline" |
|
PR_Github #39165 [ skip ] triggered by Bot. Commit: |
|
PR_Github #39165 [ skip ] completed with state |
) Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster> Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
) Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster> Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-login-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Summary by CodeRabbit
New Features
Tests
Documentation
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.