Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[None][perf] add Dynamic SMEM block routing in MOE#12456

Merged
kaiyux merged 2 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
jiahanc:OptTRTLLMRoutingjiahanc/TensorRT-LLM:OptTRTLLMRoutingCopy head branch name to clipboard
Mar 24, 2026
Merged

[None][perf] add Dynamic SMEM block routing in MOE#12456
kaiyux merged 2 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
jiahanc:OptTRTLLMRoutingjiahanc/TensorRT-LLM:OptTRTLLMRoutingCopy head branch name to clipboard

Conversation

@jiahanc
Copy link
Copy Markdown
Collaborator

@jiahanc jiahanc commented Mar 23, 2026

@coderabbitai summary

Description

Add dynamic SMEM block routing kernel in TRTLLM MOE to optimize the routing kernel in 4-16 token range

Test Coverage

Added C++ unit test
Run with ./tests/unit_tests/kernels/routingKernelsTest --gtest_filter="*DynBlock*"

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

jiahanc added 2 commits March 23, 2026 04:31
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
@jiahanc jiahanc requested a review from ChristinaZ March 23, 2026 12:14
@jiahanc jiahanc changed the title [None][Perf] add Dynamic SMEM block routing in MOE [None][perf] add Dynamic SMEM block routing in MOE Mar 23, 2026
Copy link
Copy Markdown
Collaborator

@ChristinaZ ChristinaZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@jiahanc
Copy link
Copy Markdown
Collaborator Author

jiahanc commented Mar 23, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39932 [ run ] triggered by Bot. Commit: 4dd96d8 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39932 [ run ] completed with state SUCCESS. Commit: 4dd96d8
/LLM/main/L0_MergeRequest_PR pipeline #31098 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@kaiyux kaiyux merged commit c42e86e into NVIDIA:main Mar 24, 2026
8 of 12 checks passed
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
jiahanc pushed a commit to flashinfer-ai/flashinfer that referenced this pull request Apr 13, 2026
…PdlOverlapWithNext;Remove DeepSeekV3 float32 logits constraint from
kernel launchers

<!-- .github/pull_request_template.md -->

## 📌 Description

1. Add dynamic block kernel (`routingIndicesDynBlockKernel`) comes from
the TensorRT-LLM. NVIDIA/TensorRT-LLM#12456 .
Made related modification by refactoring
 `LAUNCH_ROUTING_CUSTOM` with `dispatchRoutingPolicy` and `queryDispatchedMaxExperts`
2. Simplify PDL (Programmatic Dependent Launch) for routing kernels, as
the bug related to PDL is solved.
3. Added a default fallback tier (`Tier<1024, 32>`) to support future
models with >512 experts using the DeepSeek nGroup≤1 / MiniMax2 routing
policy.
4. Remove DeepSeekV3 float32 logits constraint
5. Improve policy tier dispatch error messages

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests
```
python -m pytest tests/moe/test_trtllm_gen_fused_moe.py -k "test_dyn_block_kernel_routing or test_tier_1024_experts_routing" -xvs
python3 -m pytest tests/moe/test_trtllm_gen_fused_moe.py -k "test_routing_dtype_flexibility" -xvs
```


- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Dynamic single-block routing for small token/expert workloads to
improve performance.

* **Improvements**
* Added a 1024-expert policy tier and clearer tier-dispatch diagnostics.
* More flexible routing/logits dtype handling and simplified kernel
overlap/launch synchronization.
* Automatic histogram-thread sizing and removal of legacy
cluster-size/overlap public flags.
* Autotuner now initializes packed TopK with per-token unique expert
IDs.

* **Tests**
* Added tests for dynamic-block routing, 1024-expert tier, MiniMax2
routing, and routing dtype combinations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.