[None][feat] Add support for expert_number<=2048 and K<=32#11510
[None][feat] Add support for expert_number<=2048 and K<=32#11510byshiue merged 1 commit intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom ChristinaZ:add_2048_supportChristinaZ/TensorRT-LLM:add_2048_supportCopy head branch name to clipboard
Conversation
📝 WalkthroughWalkthroughThe PR refactors MOE routing kernels from monolithic designs to a split-compile launcher pattern, introduces per-thread multi-expert handling for larger expert counts, increases MaxSupportedTopExperts from 22 to 32 and total expert support to 2048, and updates validator constraints accordingly. Changes
Sequence Diagram(s)sequenceDiagram
actor Host
participant Launcher as Launch Wrapper
participant MainKernel as Main Kernel
participant GridSync as PDL Grid Sync
participant HistoKernel as Histogram Kernel
participant OffsetKernel as Offset Kernel
Host->>Launcher: launchMainKernel()
Launcher->>MainKernel: Execute (compute top-k scores)
MainKernel-->>Host: (if no PDL: complete)
MainKernel->>GridSync: Trigger PDL
Host->>Launcher: launchHistogramKernel()
Launcher->>HistoKernel: Execute (count experts)
HistoKernel-->>Host: (if no PDL: complete)
HistoKernel->>GridSync: Trigger PDL
Host->>Launcher: launchOffsetsKernel()
Launcher->>OffsetKernel: Execute (compute offsets)
OffsetKernel-->>Host: Complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
cpp/tensorrt_llm/thop/mxFp4BlockScaleMoe.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to include 2026.
The file is being modified in 2026 but the copyright header still reads
2022-2025. As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification."Proposed fix
- * Copyright (c) 2022-2025, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2022-2026, NVIDIA CORPORATION. All rights reserved.cpp/tensorrt_llm/thop/fp8PerTensorScaleMoe.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to include 2026.
The header currently reads
2022-2024, but this file is being modified in 2026. As per coding guidelines, "update year on modified files."- * Copyright (c) 2022-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2022-2026, NVIDIA CORPORATION. All rights reserved.tests/unittest/_torch/thop/serial/test_moe.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2026.
The copyright header says
2022-2024, but this file is being modified in 2026. As per coding guidelines, all source files must update the year on modified files.-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.cpp/tensorrt_llm/thop/fp4BlockScaleMoe.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to include 2026.
The copyright header reads
2022-2024, but this file has meaningful modifications in 2026. As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification."Proposed fix
- * Copyright (c) 2022-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2022-2026, NVIDIA CORPORATION. All rights reserved.cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h (1)
2-2:⚠️ Potential issue | 🟡 MinorCopyright year needs updating.
The copyright header says
2022-2024but this file has been modified in 2025. Update to2022-2025.As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification."
Proposed fix
- * Copyright (c) 2022-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2022-2025, NVIDIA CORPORATION. All rights reserved.cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu (1)
42-44:⚠️ Potential issue | 🟡 MinorCopy-paste error in error message: says "Llama4" but this is the Renormalize path.
Line 44 reads
"Llama4 routing kernel expects permuted idx..."but this isroutingRenormalize::run. Should say "Renormalize routing kernel".Proposed fix
TLLM_CHECK_WITH_INFO(data.mPtrPermutedIdxSize != nullptr && data.mPtrCtaIdxXyToBatchIdx != nullptr && data.mPtrCtaIdxXyToMnLimit != nullptr && data.mPtrNumNonExitingCtas != nullptr, - "Llama4 routing kernel expects permuted idx and grouped Gemm launch config buffers"); + "Renormalize routing kernel expects permuted idx and grouped Gemm launch config buffers");
🤖 Fix all issues with AI agents
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchCoopKernel.cu`:
- Around line 78-81: The stride-alignment check in launchCoopKernel.cu is using
a bitwise test `(localExpertIdx & params.mLocalExpertsStrideLog2) == 0` which is
incorrect; update the check used when computing isLocalExpert (the expression
that now references params.mLocalExpertsStrideLog2) to use a proper mask:
`(localExpertIdx & ((1u << params.mLocalExpertsStrideLog2) - 1)) == 0`, then
keep the rest of the logic that computes expertOffsets (the ternary
atomicAdd(smemExpertCount + expertIdx, 1) : 0) unchanged; ensure you apply the
same corrected mask pattern to the other identical checks in this file (and
mirror the fix in the other files cited) so that localExpertIdx,
params.mLocalExpertsStrideLog2, expertOffsets and the atomicAdd usage behave
correctly for stride log2 >= 1.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchHistogramKernel.cu`:
- Line 2: The copyright header in launchHistogramKernel.cu (and other new
launcher .cu files listed: launchOffsetsKernel.cu, launchInitExpertCounts.cu,
launchCoopKernel.cu, launchClusterKernel.cu, launchMainKernel.cu) stops at
"2022-2025" but these are new 2026 files—update the header to include 2026
(e.g., "2022-2026" or the appropriate range) so the file-level copyright
reflects the latest modification year; modify the top-of-file comment in each
mentioned file (launchHistogramKernel.cu and the other launcher .cu filenames)
to replace "2022-2025" with the correct year range including 2026.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchMainKernel.cu`:
- Around line 204-205: Rename the misspelled local arrays intermidiateScore and
intermidiateExpert to intermediateScore and intermediateExpert, and update every
usage/reference to these symbols (including places originally between the nearby
block where they are read/written) to match the corrected names so compilation
and semantics remain consistent.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchBlockKernel.cu`:
- Around line 167-187: The two consecutive CUB BlockScan calls using the same
TempStorage (Scan(tempStorage).ExclusiveSum(numCtaPerExpert, ctaOffsetPerExpert,
numNonExitingCtas) and Scan(tempStorage).ExclusiveSum(tmpCountPerExpert,
expertScanCountsPerExpert)) must be separated by a barrier: add a
__syncthreads() immediately after the first ExclusiveSum (after
ctaOffsetPerExpert/numNonExitingCtas are computed) and before computing
tmpCountPerExpert to prevent TempStorage reuse races; keep the existing
__syncthreads() after the second scan as well and ensure the barrier location
references the existing tempStorage, ctaOffsetPerExpert, accExpertCount,
tmpCountPerExpert, and expertScanCountsPerExpert variables.
- Around line 128-131: The check treating mLocalExpertsStrideLog2 as a bitmask
is wrong: wherever you use (localExpIdx & params.mLocalExpertsStrideLog2) == 0
(e.g., in the localExpIdx/isLocal computation and the later conditional in
launchBlockKernel.cu), replace it with a proper mask built from the log2 value,
i.e. compute mask = (1u << params.mLocalExpertsStrideLog2) - 1 and test
(localExpIdx & mask) == 0; update uses of localExpIdx,
params.mLocalExpertsStartIdx, params.mNumLocalExperts, and
params.mLocalExpertsStrideLog2 in this file (and mirror the same change in other
affected functions like routingDeepSeek, RoutingLlama4.cu, RoutingKernel.cuh) so
the stride-log2 is interpreted correctly.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchInitExpertCounts.cu`:
- Line 2: Replace the outdated copyright header string "2022-2025" with the
correct latest-year range that includes 2026 (e.g. "2022-2026" or a header that
ends with 2026) in the new launcher files (e.g. launchInitExpertCounts.cu and
the other five launchers under routingRenormalize) so the top-of-file NVIDIA
copyright reflects the 2026 modification; search for the exact literal
"2022-2025" in each new file and update it consistently.
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/RoutingRenormalizeCommon.cuh`:
- Line 136: The macro name is misspelled: replace all occurrences of
LAUNCH_ROUTING_RENORNALIZE with the correctly spelled
LAUNCH_ROUTING_RENORMALIZE; update the macro definition in
RoutingRenormalizeCommon.cuh and rename every invocation where the old macro is
used so the compile-time symbol matches (ensure you update the macro token in
the six call sites that reference it), then rebuild to verify no remaining
references to LAUNCH_ROUTING_RENORNALIZE remain.
🧹 Nitpick comments (11)
cpp/tensorrt_llm/thop/fp8BlockScaleMoe.cpp (1)
2-2: Update the copyright year to include 2026.The header currently says
2022-2024, but this file is being modified now. As per coding guidelines: "update year on modified files."- * Copyright (c) 2022-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2022-2026, NVIDIA CORPORATION. All rights reserved.tests/unittest/_torch/thop/serial/test_moe.py (2)
1095-1105: Large memory footprint — consider CI impact.With
num_experts=2048,hidden_size=1024, andintermediate_size=1024,gemm1_weightsalone is(2048, 2048, 1024)in bf16 (~8 GB), plusgemm2_weights(~4 GB), plus all quantized copies and scales. Combined with the parametrize cross-product (num_tokens × intermediate_size × act_type), this generates multiple heavy test cases.Consider either:
- Reducing
hidden_size/intermediate_sizefor this specific large-expert case, or- Limiting the cross-product (e.g., fixing
intermediate_size=768and a singleact_typefor this param).
1189-1199: Same memory concern astest_autotuneapplies here.This
test_no_autotunevariant additionally cross-products withuse_topk_as_input=[False, True], further doubling the large-expert test matrix. Theuse_topk_as_input=Truepath hits theDeepSeekV3-only skip at line 1357, but theFalsepath still runs all combinations.cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingLlama4.cu (1)
29-30: Commented-out code should use#if/#endifinstead of comments.Line 29 uses a comment to disable code (
// static constexpr int MaxNumExperts = 128;). If this is dead code, remove it entirely. If it's intentionally kept for reference, use#if 0/#endifper the coding guidelines.As per coding guidelines: "Use
#if/#endifto disable code, preferably with a mnemonic condition... Do not use comments to disable code."cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h (1)
37-70: Good refactor todo { ... } while (0)for theLAUNCH_PDLmacro.This is a well-known best practice for multi-statement macros. One minor coding guideline nit: the
ifon lines 56 and 65 should be followed by brace-delimited statements.As per coding guidelines: "
Ifandelseshould always be followed by brace-delimited statements, even if empty or a single statement."Proposed fix (lines 56-58, 65-67)
if (smemSize > 48 * 1024) \ - TLLM_CUDA_CHECK( \ - cudaFuncSetAttribute(kernelTyped, cudaFuncAttributeMaxDynamicSharedMemorySize, smemSize)); \ + { \ + TLLM_CUDA_CHECK( \ + cudaFuncSetAttribute(kernelTyped, cudaFuncAttributeMaxDynamicSharedMemorySize, smemSize)); \ + } \Apply similarly to both the
trueandfalseUsePdlbranches.cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/RoutingDeepSeekCommon.cuh (1)
26-29:std::maxwith initializer list may need explicit<algorithm>include.Line 29 uses
std::max({...})with an initializer list, which requires<algorithm>(and implicitly<initializer_list>). These may be provided transitively viaRoutingKernel.cuh→DevKernel.h, but relying on transitive includes is fragile. Consider a direct#include <algorithm>, or alternatively, since all three values are compile-time constants, a simpler expression like nestedstd::maxcalls or just hardcoding512(the actual max) would avoid the dependency entirely.cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchInitExpertCounts.cu (1)
27-30: Missing/*coopLaunch=*/inline comment for consistency.All other launcher wrappers annotate the
coopLaunchparameter with an inline comment (/*coopLaunch=*/false). This one passes barefalse.Suggested fix
- LAUNCH_ROUTING_DEEPSEEK(data, false, routingInitExpertCounts, (2 * data.mNumExperts - 1) / numThreadsHist + 1, + LAUNCH_ROUTING_DEEPSEEK(data, /*coopLaunch=*/false, routingInitExpertCounts, (2 * data.mNumExperts - 1) / numThreadsHist + 1,As per coding guidelines: "In function calls where parameters are not obvious, use inline C comments to document the parameter."
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchCoopKernel.cu (1)
216-222: TODO note: PDL visibility concern is documented but unresolved.The comment on line 218 explicitly states: "this is not sufficient to ensure visibility in the next kernel!" This suggests the secondary kernel may observe stale data for
mPtrCtaIdxXyToBatchIdx,mPtrCtaIdxXyToMnLimit,mPtrNumNonExitingCtas, andmPtrPermutedIdxSize.Is there a tracking issue for this? If the dependent FC1 kernel relies on these outputs being visible, a missing memory fence before
cudaTriggerProgrammaticLaunchCompletioncould cause data races on SM90+.Would you like me to open an issue to track this PDL visibility concern?
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchClusterKernel.cu (1)
44-49: Non-SM90 fallback is missing__launch_bounds__unlike the SM90+ path.The SM90+ variant (line 27) uses
__launch_bounds__(KernelParams::MaxNumExperts)while the fallback (line 45) has no launch bounds annotation. For consistency with the Renormalize variant (which uses__launch_bounds__(NumThreads)on its fallback), consider adding it here. This is minor since the fallback only asserts false.cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramScoresKernel.cu (1)
44-44: Unused variableminScore.
minScoreis initialized to-INFINITYbut never read or passed to any function in this kernel. Likely a copy-paste artifact from the block kernel. Remove it to avoid compiler warnings and dead code.Suggested fix
- BaseType minScore = BaseType{-INFINITY};cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchBlockKernel.cu (1)
100-100: Unused variableminScore.Same as in
launchHistogramScoresKernel.cu—minScoreis initialized but never referenced. Remove to avoid dead code.Suggested fix
- BaseType minScore = BaseType{-INFINITY};
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
72c23a1 to
ce58d29
Compare
|
/bot run |
|
PR_Github #37524 [ run ] triggered by Bot. Commit: |
|
PR_Github #37524 [ run ] completed with state
|
|
/bot run |
|
PR_Github #37621 [ run ] triggered by Bot. Commit: |
|
PR_Github #37621 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #37662 [ run ] triggered by Bot. Commit: |
|
/bot run |
|
PR_Github #37668 [ run ] triggered by Bot. Commit: |
|
PR_Github #37668 [ run ] completed with state |
<!-- .github/pull_request_template.md --> ## 📌 Description - Integrate NVIDIA/TensorRT-LLM#11510 to support 2048 num of experts and 32 TopK in renormalize - Refactor MOE cu files ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Expanded MoE routing/renormalize to support up to 2,048 experts and top-k up to 32; backend reorganized to enable larger configurations. * **Bug Fixes** * Clamped token counts in kernel launches to prevent oversized grid launches. * **Performance** * Reworked routing/launch paths for improved scalability and throughput with large expert/top-k settings. * **Tests** * Added test scenarios covering large-expert (2,048) + top-k (32) configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
<!-- .github/pull_request_template.md --> ## 📌 Description - Integrate NVIDIA/TensorRT-LLM#11510 to support 2048 num of experts and 32 TopK in renormalize - Refactor MOE cu files ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Expanded MoE routing/renormalize to support up to 2,048 experts and top-k up to 32; backend reorganized to enable larger configurations. * **Bug Fixes** * Clamped token counts in kernel launches to prevent oversized grid launches. * **Performance** * Reworked routing/launch paths for improved scalability and throughput with large expert/top-k settings. * **Tests** * Added test scenarios covering large-expert (2,048) + top-k (32) configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Summary by CodeRabbit
New Features
Performance Improvements
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.