[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t… by karljang · Pull Request #12007 · NVIDIA/TensorRT-LLM

karljang · Mar 8, 2026

Summary by CodeRabbit

Improvements
- Updated default caching threshold values for improved performance
- Model-specific thresholds now automatically applied based on variant selection
- Refined configuration handling for visual generation models

Description

Fix FLUX.1 TeaCache polynomial coefficients that were incorrectly copied from WAN I2V 480P config, and update the default threshold to match the official TeaCache repository.

Root cause: FLUX.1 coefficients [2.57e+05, ...] were WAN I2V 480P values instead of the official FLUX values [4.99e+02, ...] (leading term 515x too large)
Impact: TeaCache hit rate was ~28% instead of ~64%, largely negating the caching speedup
Fix: Replace with official coefficients from TeaCache4FLUX, update default threshold 0.2 → 0.6

Test Coverage

Validated on H200 — FLUX.1-dev, 10 prompts, 50 steps, seed 42

Threshold	Hit Rate	PSNR vs Baseline (dB)	Official Expected Speedup
0.25	42.4%	24.7 (min 14.1)	~1.5x
0.40	55.1%	20.6 (min 13.7)	~1.8x
0.60 (default)	63.6%	19.1 (min 13.1)	~2.0x
0.80	68.9%	18.1 (min 12.9)	~2.25x

All 10 prompts produce visually coherent images across all thresholds

Visual comparison (Baseline → t=0.25 → t=0.4 → t=0.6 → t=0.8)

Verify FLUX.1 TeaCache hit rate matches expected ~64% at default threshold
Verify FLUX.2 is unaffected (no coefficient or threshold changes)
Verify WAN pipelines are unaffected
Verify user-specified --teacache_thresh still overrides the default

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…hreshold The FLUX.1 TeaCache coefficients were incorrectly copied from the WAN I2V 480P config instead of the official TeaCache FLUX values. This caused suboptimal cache hit rates (~28%) compared to the expected ~64% at the default threshold. Changes: - Fix FLUX.1 polynomial coefficients to match official TeaCache repo - Update default threshold from 0.2 to 0.6 (official FLUX default, ~2x speedup) - Support per-model default_thresh in coefficient config - Update flux1.yml and example script defaults Validated on H200 with 10 prompts (50 steps, seed 42): thresh=0.25: 42% hit rate, 24.7 dB PSNR (~1.5x speedup) thresh=0.40: 55% hit rate, 20.6 dB PSNR (~1.8x speedup) thresh=0.60: 64% hit rate, 19.1 dB PSNR (~2.0x speedup) thresh=0.80: 69% hit rate, 18.1 dB PSNR (~2.25x speedup) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

karljang · Mar 8, 2026

/bot run

coderabbitai · Mar 8, 2026

📝 Walkthrough

Walkthrough

Updates TeaCache threshold default values from 0.2 to 0.6 across configuration files, example scripts, and model coefficients. Adds conditional logic to automatically apply model-specific default thresholds when not explicitly set by users.

Changes

Cohort / File(s)	Summary
Configuration and Example `examples/visual_gen/serve/configs/flux1.yml`, `examples/visual_gen/visual_gen_flux.py`	Updated teacache threshold default from 0.2 to 0.6; changed parser default to None with updated help text reflecting per-FLUX version defaults; made teacache_thresh conditional in diffusion_config construction.
Model Coefficients `tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py`	Updated FLUX_TEACACHE_COEFFICIENTS for "dev" variant with new ret_steps and standard coefficient arrays; added default_thresh: 0.6 entry for automatic threshold application.
Pipeline Setup Logic `tensorrt_llm/_torch/visual_gen/pipeline.py`	Added conditional logic in _setup_teacache to apply model-specific default threshold from coefficients when user has not explicitly set teacache_thresh; includes logging of applied defaults.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: fixing FLUX.1 TeaCache polynomial coefficients and updating the default threshold, which directly relates to the changeset.
Description check	✅ Passed	The PR description comprehensively explains the root cause, impact, solution, and includes detailed validation results with test coverage data.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/visual_gen/visual_gen_flux.py (1)

218-226: Consider simplifying the conditional dict construction.

The inline conditional unpacking works but is dense. Building the dict incrementally could improve readability.

♻️ Optional: Alternative approach

+    teacache_config = {
+        "enable_teacache": args.enable_teacache,
+        "use_ret_steps": args.use_ret_steps,
+    }
+    if args.teacache_thresh is not None:
+        teacache_config["teacache_thresh"] = args.teacache_thresh
+
     diffusion_config = {
         "revision": args.revision,
         "attention": {
             "backend": args.attention_backend,
         },
-        "teacache": {
-            "enable_teacache": args.enable_teacache,
-            **(
-                {"teacache_thresh": args.teacache_thresh}
-                if args.teacache_thresh is not None
-                else {}
-            ),
-            "use_ret_steps": args.use_ret_steps,
-        },
+        "teacache": teacache_config,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/visual_gen/visual_gen_flux.py` around lines 218 - 226, The
"teacache" dict construction is dense due to inline conditional unpacking;
instead, create the dict incrementally by first building a base dict with
"enable_teacache" and "use_ret_steps" and then, if args.teacache_thresh is not
None, add "teacache_thresh" to that dict (referencing the same
args.enable_teacache, args.teacache_thresh, args.use_ret_steps and the
"teacache" dict key in the current code) so the intent is clearer and easier to
read.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/visual_gen/pipeline.py`:
- Around line 175-184: Add the required NVIDIA copyright header and Apache 2.0
license block at the top of this file
(tensorrt_llm/_torch/visual_gen/pipeline.py) before any imports or code; keep
the existing logic (including the Pydantic v2 usage of
teacache_cfg.model_fields_set in the block around default_thresh) untouched, and
ensure the header text matches the project's standard NVIDIA + Apache-2.0
boilerplate used in other TensorRT-LLM source files.

---

Nitpick comments:
In `@examples/visual_gen/visual_gen_flux.py`:
- Around line 218-226: The "teacache" dict construction is dense due to inline
conditional unpacking; instead, create the dict incrementally by first building
a base dict with "enable_teacache" and "use_ret_steps" and then, if
args.teacache_thresh is not None, add "teacache_thresh" to that dict
(referencing the same args.enable_teacache, args.teacache_thresh,
args.use_ret_steps and the "teacache" dict key in the current code) so the
intent is clearer and easier to read.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a2e10b1-6487-4007-bfb4-5d20bd9073c1

📥 Commits

Reviewing files that changed from the base of the PR and between 5eb8eab and d872120.

📒 Files selected for processing (4)

examples/visual_gen/serve/configs/flux1.yml
examples/visual_gen/visual_gen_flux.py
tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
tensorrt_llm/_torch/visual_gen/pipeline.py

zhenhuaw-me

Thanks for the fix! I was also thinking about changing similar configuration defaults to None and take per-model default in the pipeline.

tensorrt-cicd · Mar 8, 2026

PR_Github #38133 [ run ] triggered by Bot. Commit: d872120 Link to invocation

tensorrt-cicd · Mar 8, 2026

PR_Github #38133 [ run ] completed with state SUCCESS. Commit: d872120
/LLM/main/L0_MergeRequest_PR pipeline #29541 completed with status: 'SUCCESS'

Link to invocation

…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

karljang requested review from chang-l, forrestl111 and zhenhuaw-me March 8, 2026 09:02

karljang requested review from a team as code owners March 8, 2026 09:02

karljang requested a review from nv-guomingz March 8, 2026 09:02

github-actions Bot assigned karljang Mar 8, 2026

coderabbitai Bot reviewed Mar 8, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/visual_gen/pipeline.py

zhenhuaw-me approved these changes Mar 8, 2026

View reviewed changes

zhenhuaw-me added the VisualGen label Mar 9, 2026

zhenhuaw-me reviewed Mar 9, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py

zhenhuaw-me requested a review from QiJune March 9, 2026 02:13

QiJune approved these changes Mar 9, 2026

View reviewed changes

zhenhuaw-me merged commit 7b4da2b into NVIDIA:main Mar 9, 2026
8 checks passed

karljang deleted the fix/flux1-teacache-coefficients branch March 9, 2026 05:51

tianyuz-nv pushed a commit to wanqian-nv/TensorRT-LLM that referenced this pull request Mar 19, 2026

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and d…

86511cb

…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and d…

12de78f

…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and d…

2618e2f

…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t…#12007

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t…#12007
zhenhuaw-me merged 1 commit intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
karljang:fix/flux1-teacache-coefficientskarljang/TensorRT-LLM:fix/flux1-teacache-coefficientsCopy head branch name to clipboard

karljang commented Mar 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

karljang commented Mar 8, 2026

Uh oh!

coderabbitai Bot commented Mar 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

zhenhuaw-me left a comment

Uh oh!

tensorrt-cicd commented Mar 8, 2026

Uh oh!

tensorrt-cicd commented Mar 8, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Search code, repositories, users, issues, pull requests...

Conversation

karljang commented Mar 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

Validated on H200 — FLUX.1-dev, 10 prompts, 50 steps, seed 42

Visual comparison (Baseline → t=0.25 → t=0.4 → t=0.6 → t=0.8)

PR Checklist

GitHub Bot Help

Uh oh!

karljang commented Mar 8, 2026

Uh oh!

coderabbitai Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhenhuaw-me left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Mar 8, 2026

Uh oh!

tensorrt-cicd commented Mar 8, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karljang commented Mar 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 8, 2026 •

edited

Loading