[None][perf] add Dynamic SMEM block routing in MOE by jiahanc · Pull Request #12456 · NVIDIA/TensorRT-LLM

jiahanc · Mar 23, 2026

@coderabbitai summary

Description

Add dynamic SMEM block routing kernel in TRTLLM MOE to optimize the routing kernel in 4-16 token range

Test Coverage

Added C++ unit test
Run with ./tests/unit_tests/kernels/routingKernelsTest --gtest_filter="*DynBlock*"

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

ChristinaZ

LGTM, thanks!

jiahanc · Mar 23, 2026

/bot run

tensorrt-cicd · Mar 23, 2026

PR_Github #39932 [ run ] triggered by Bot. Commit: 4dd96d8 Link to invocation

tensorrt-cicd · Mar 23, 2026

PR_Github #39932 [ run ] completed with state SUCCESS. Commit: 4dd96d8
/LLM/main/L0_MergeRequest_PR pipeline #31098 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

…PdlOverlapWithNext;Remove DeepSeekV3 float32 logits constraint from kernel launchers  ## 📌 Description 1. Add dynamic block kernel (`routingIndicesDynBlockKernel`) comes from the TensorRT-LLM. NVIDIA/TensorRT-LLM#12456 . Made related modification by refactoring `LAUNCH_ROUTING_CUSTOM` with `dispatchRoutingPolicy` and `queryDispatchedMaxExperts` 2. Simplify PDL (Programmatic Dependent Launch) for routing kernels, as the bug related to PDL is solved. 3. Added a default fallback tier (`Tier<1024, 32>`) to support future models with >512 experts using the DeepSeek nGroup≤1 / MiniMax2 routing policy. 4. Remove DeepSeekV3 float32 logits constraint 5. Improve policy tier dispatch error messages ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests ``` python -m pytest tests/moe/test_trtllm_gen_fused_moe.py -k "test_dyn_block_kernel_routing or test_tier_1024_experts_routing" -xvs python3 -m pytest tests/moe/test_trtllm_gen_fused_moe.py -k "test_routing_dtype_flexibility" -xvs ``` - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **New Features** * Dynamic single-block routing for small token/expert workloads to improve performance. * **Improvements** * Added a 1024-expert policy tier and clearer tier-dispatch diagnostics. * More flexible routing/logits dtype handling and simplified kernel overlap/launch synchronization. * Automatic histogram-thread sizing and removal of legacy cluster-size/overlap public flags. * Autotuner now initializes packed TopK with per-token unique expert IDs. * **Tests** * Added tests for dynamic-block routing, 1024-expert tier, MiniMax2 routing, and routing dtype combinations.  --------- Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

jiahanc added 2 commits March 23, 2026 04:31

add Dynamic SMEM block routing

aeb405a

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

remove unneed

4dd96d8

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

github-actions Bot assigned jiahanc Mar 23, 2026

jiahanc requested a review from ChristinaZ March 23, 2026 12:14

jiahanc changed the title ~~[None][Perf] add Dynamic SMEM block routing in MOE~~ [None][perf] add Dynamic SMEM block routing in MOE Mar 23, 2026

ChristinaZ approved these changes Mar 23, 2026

View reviewed changes

kaiyux merged commit c42e86e into NVIDIA:main Mar 24, 2026
8 of 12 checks passed

longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026

[None][perf] add Dynamic SMEM block routing in MOE (NVIDIA#12456)

ded8ed8

Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>

ChristinaZ mentioned this pull request Apr 6, 2026

Second part of refactoring the routing part flashinfer-ai/flashinfer#2993

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][perf] add Dynamic SMEM block routing in MOE#12456

[None][perf] add Dynamic SMEM block routing in MOE#12456
kaiyux merged 2 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
jiahanc:OptTRTLLMRoutingjiahanc/TensorRT-LLM:OptTRTLLMRoutingCopy head branch name to clipboard

jiahanc commented Mar 23, 2026 •

edited

Loading

Uh oh!

ChristinaZ left a comment

Uh oh!

jiahanc commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Search code, repositories, users, issues, pull requests...

Conversation

jiahanc commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

ChristinaZ left a comment

Choose a reason for hiding this comment

Uh oh!

jiahanc commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

tensorrt-cicd commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiahanc commented Mar 23, 2026 •

edited

Loading