Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t…#12007

Merged
zhenhuaw-me merged 1 commit intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
karljang:fix/flux1-teacache-coefficientskarljang/TensorRT-LLM:fix/flux1-teacache-coefficientsCopy head branch name to clipboard
Mar 9, 2026
Merged

[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t…#12007
zhenhuaw-me merged 1 commit intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
karljang:fix/flux1-teacache-coefficientskarljang/TensorRT-LLM:fix/flux1-teacache-coefficientsCopy head branch name to clipboard

Conversation

@karljang
Copy link
Copy Markdown
Collaborator

@karljang karljang commented Mar 8, 2026

Summary by CodeRabbit

  • Improvements
    • Updated default caching threshold values for improved performance
    • Model-specific thresholds now automatically applied based on variant selection
    • Refined configuration handling for visual generation models

Description

Fix FLUX.1 TeaCache polynomial coefficients that were incorrectly copied from WAN I2V 480P config, and update the default threshold to match the official TeaCache repository.

  • Root cause: FLUX.1 coefficients [2.57e+05, ...] were WAN I2V 480P values instead of the official FLUX values [4.99e+02, ...] (leading term 515x too large)
  • Impact: TeaCache hit rate was ~28% instead of ~64%, largely negating the caching speedup
  • Fix: Replace with official coefficients from TeaCache4FLUX, update default threshold 0.2 → 0.6

Test Coverage

Validated on H200 — FLUX.1-dev, 10 prompts, 50 steps, seed 42

Threshold Hit Rate PSNR vs Baseline (dB) Official Expected Speedup
0.25 42.4% 24.7 (min 14.1) ~1.5x
0.40 55.1% 20.6 (min 13.7) ~1.8x
0.60 (default) 63.6% 19.1 (min 13.1) ~2.0x
0.80 68.9% 18.1 (min 12.9) ~2.25x
  • All 10 prompts produce visually coherent images across all thresholds

Visual comparison (Baseline → t=0.25 → t=0.4 → t=0.6 → t=0.8)

  • Verify FLUX.1 TeaCache hit rate matches expected ~64% at default threshold
  • Verify FLUX.2 is unaffected (no coefficient or threshold changes)
  • Verify WAN pipelines are unaffected
  • Verify user-specified --teacache_thresh still overrides the default

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…hreshold

The FLUX.1 TeaCache coefficients were incorrectly copied from the WAN I2V 480P
config instead of the official TeaCache FLUX values. This caused suboptimal
cache hit rates (~28%) compared to the expected ~64% at the default threshold.

Changes:
- Fix FLUX.1 polynomial coefficients to match official TeaCache repo
- Update default threshold from 0.2 to 0.6 (official FLUX default, ~2x speedup)
- Support per-model default_thresh in coefficient config
- Update flux1.yml and example script defaults

Validated on H200 with 10 prompts (50 steps, seed 42):
  thresh=0.25: 42% hit rate, 24.7 dB PSNR (~1.5x speedup)
  thresh=0.40: 55% hit rate, 20.6 dB PSNR (~1.8x speedup)
  thresh=0.60: 64% hit rate, 19.1 dB PSNR (~2.0x speedup)
  thresh=0.80: 69% hit rate, 18.1 dB PSNR (~2.25x speedup)

Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
@karljang karljang requested review from a team as code owners March 8, 2026 09:02
@karljang karljang requested a review from nv-guomingz March 8, 2026 09:02
@karljang
Copy link
Copy Markdown
Collaborator Author

karljang commented Mar 8, 2026

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

Updates TeaCache threshold default values from 0.2 to 0.6 across configuration files, example scripts, and model coefficients. Adds conditional logic to automatically apply model-specific default thresholds when not explicitly set by users.

Changes

Cohort / File(s) Summary
Configuration and Example
examples/visual_gen/serve/configs/flux1.yml, examples/visual_gen/visual_gen_flux.py
Updated teacache threshold default from 0.2 to 0.6; changed parser default to None with updated help text reflecting per-FLUX version defaults; made teacache_thresh conditional in diffusion_config construction.
Model Coefficients
tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
Updated FLUX_TEACACHE_COEFFICIENTS for "dev" variant with new ret_steps and standard coefficient arrays; added default_thresh: 0.6 entry for automatic threshold application.
Pipeline Setup Logic
tensorrt_llm/_torch/visual_gen/pipeline.py
Added conditional logic in _setup_teacache to apply model-specific default threshold from coefficients when user has not explicitly set teacache_thresh; includes logging of applied defaults.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing FLUX.1 TeaCache polynomial coefficients and updating the default threshold, which directly relates to the changeset.
Description check ✅ Passed The PR description comprehensively explains the root cause, impact, solution, and includes detailed validation results with test coverage data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
examples/visual_gen/visual_gen_flux.py (1)

218-226: Consider simplifying the conditional dict construction.

The inline conditional unpacking works but is dense. Building the dict incrementally could improve readability.

♻️ Optional: Alternative approach
+    teacache_config = {
+        "enable_teacache": args.enable_teacache,
+        "use_ret_steps": args.use_ret_steps,
+    }
+    if args.teacache_thresh is not None:
+        teacache_config["teacache_thresh"] = args.teacache_thresh
+
     diffusion_config = {
         "revision": args.revision,
         "attention": {
             "backend": args.attention_backend,
         },
-        "teacache": {
-            "enable_teacache": args.enable_teacache,
-            **(
-                {"teacache_thresh": args.teacache_thresh}
-                if args.teacache_thresh is not None
-                else {}
-            ),
-            "use_ret_steps": args.use_ret_steps,
-        },
+        "teacache": teacache_config,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/visual_gen/visual_gen_flux.py` around lines 218 - 226, The
"teacache" dict construction is dense due to inline conditional unpacking;
instead, create the dict incrementally by first building a base dict with
"enable_teacache" and "use_ret_steps" and then, if args.teacache_thresh is not
None, add "teacache_thresh" to that dict (referencing the same
args.enable_teacache, args.teacache_thresh, args.use_ret_steps and the
"teacache" dict key in the current code) so the intent is clearer and easier to
read.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/visual_gen/pipeline.py`:
- Around line 175-184: Add the required NVIDIA copyright header and Apache 2.0
license block at the top of this file
(tensorrt_llm/_torch/visual_gen/pipeline.py) before any imports or code; keep
the existing logic (including the Pydantic v2 usage of
teacache_cfg.model_fields_set in the block around default_thresh) untouched, and
ensure the header text matches the project's standard NVIDIA + Apache-2.0
boilerplate used in other TensorRT-LLM source files.

---

Nitpick comments:
In `@examples/visual_gen/visual_gen_flux.py`:
- Around line 218-226: The "teacache" dict construction is dense due to inline
conditional unpacking; instead, create the dict incrementally by first building
a base dict with "enable_teacache" and "use_ret_steps" and then, if
args.teacache_thresh is not None, add "teacache_thresh" to that dict
(referencing the same args.enable_teacache, args.teacache_thresh,
args.use_ret_steps and the "teacache" dict key in the current code) so the
intent is clearer and easier to read.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a2e10b1-6487-4007-bfb4-5d20bd9073c1

📥 Commits

Reviewing files that changed from the base of the PR and between 5eb8eab and d872120.

📒 Files selected for processing (4)
  • examples/visual_gen/serve/configs/flux1.yml
  • examples/visual_gen/visual_gen_flux.py
  • tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
  • tensorrt_llm/_torch/visual_gen/pipeline.py

Comment thread tensorrt_llm/_torch/visual_gen/pipeline.py
Copy link
Copy Markdown
Member

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I was also thinking about changing similar configuration defaults to None and take per-model default in the pipeline.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38133 [ run ] triggered by Bot. Commit: d872120 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #38133 [ run ] completed with state SUCCESS. Commit: d872120
/LLM/main/L0_MergeRequest_PR pipeline #29541 completed with status: 'SUCCESS'

Link to invocation

Comment thread tensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.py
@zhenhuaw-me zhenhuaw-me requested a review from QiJune March 9, 2026 02:13
@zhenhuaw-me zhenhuaw-me merged commit 7b4da2b into NVIDIA:main Mar 9, 2026
8 checks passed
@karljang karljang deleted the fix/flux1-teacache-coefficients branch March 9, 2026 05:51
tianyuz-nv pushed a commit to wanqian-nv/TensorRT-LLM that referenced this pull request Mar 19, 2026
…efault t… (NVIDIA#12007)

Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
…efault t… (NVIDIA#12007)

Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026
…efault t… (NVIDIA#12007)

Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.