Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Disable the RHT fusion for non-SM100 family devices
#2968 opened May 8, 2026 by ptrendx Member Loading…
1 of 13 tasks
[torch.compile][PyTorch] Prepare linear for torch compile
#2967 opened May 7, 2026 by pggPL Collaborator Loading…
7 of 8 tasks
[PyTorch] Batch CP attention tests in single torchrun to amortize NCC…
#2965 opened May 6, 2026 by sudhakarsingh27 Collaborator Loading…
7 of 8 tasks
[All] Refactor nvte_get_fused_attn_backend with cudnn-frontend calls
#2964 opened May 6, 2026 by cyanguwa Collaborator Loading…
10 of 13 tasks
Refactor tensor class in C++ unit tests refactor
#2962 opened May 6, 2026 by timmoon10 Collaborator Loading…
8 of 13 tasks
Draft:Extended Tensor Parallelism
#2960 opened May 5, 2026 by jiemingz Draft
13 tasks
[Common] Use specialized unfused MXFP8 cast kernels by default
#2958 opened May 5, 2026 by Oleg-Goncharov Collaborator Loading…
5 of 13 tasks
CPU overhead optimizations for te autocast
#2957 opened May 4, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
[Common, PyTorch] Improve mHC to match DeepSeek's implementation
#2953 opened May 1, 2026 by kainzhong Collaborator Draft
9 of 13 tasks
[Common, PyTorch] Add Triton MLA attention kernels for SM80 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2950 opened Apr 30, 2026 by bzantium Loading…
Add NVFP4 1x64 Local Encode Recipe
#2941 opened Apr 29, 2026 by cael-ling Contributor Loading…
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938 opened Apr 28, 2026 by hxbai Contributor Loading…
13 tasks
Fix CUDA graph parameter grad lifetime
#2937 opened Apr 28, 2026 by buptzyb Contributor Loading…
[PyTorch] Enable head dim 256 for FA4
#2932 opened Apr 27, 2026 by yaox12 Member Loading…
1 of 13 tasks
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928 opened Apr 25, 2026 by eyupcanakman Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925 opened Apr 25, 2026 by ksivaman Member Loading…
7 of 13 tasks
[PyTorch] Add distributed Muon optimizer 2.16.0
#2920 opened Apr 23, 2026 by vcherepanov-nv Collaborator Loading…
5 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Contributor Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.
Morty Proxy This is a proxified and sanitized view of the page, visit original site.