[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t…#12007
[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t…#12007zhenhuaw-me merged 1 commit intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom karljang:fix/flux1-teacache-coefficientskarljang/TensorRT-LLM:fix/flux1-teacache-coefficientsCopy head branch name to clipboard
Conversation
…hreshold The FLUX.1 TeaCache coefficients were incorrectly copied from the WAN I2V 480P config instead of the official TeaCache FLUX values. This caused suboptimal cache hit rates (~28%) compared to the expected ~64% at the default threshold. Changes: - Fix FLUX.1 polynomial coefficients to match official TeaCache repo - Update default threshold from 0.2 to 0.6 (official FLUX default, ~2x speedup) - Support per-model default_thresh in coefficient config - Update flux1.yml and example script defaults Validated on H200 with 10 prompts (50 steps, seed 42): thresh=0.25: 42% hit rate, 24.7 dB PSNR (~1.5x speedup) thresh=0.40: 55% hit rate, 20.6 dB PSNR (~1.8x speedup) thresh=0.60: 64% hit rate, 19.1 dB PSNR (~2.0x speedup) thresh=0.80: 69% hit rate, 18.1 dB PSNR (~2.25x speedup) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
|
/bot run |
📝 WalkthroughWalkthroughUpdates TeaCache threshold default values from 0.2 to 0.6 across configuration files, example scripts, and model coefficients. Adds conditional logic to automatically apply model-specific default thresholds when not explicitly set by users. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/visual_gen/visual_gen_flux.py (1)
218-226: Consider simplifying the conditional dict construction.The inline conditional unpacking works but is dense. Building the dict incrementally could improve readability.
♻️ Optional: Alternative approach
+ teacache_config = { + "enable_teacache": args.enable_teacache, + "use_ret_steps": args.use_ret_steps, + } + if args.teacache_thresh is not None: + teacache_config["teacache_thresh"] = args.teacache_thresh + diffusion_config = { "revision": args.revision, "attention": { "backend": args.attention_backend, }, - "teacache": { - "enable_teacache": args.enable_teacache, - **( - {"teacache_thresh": args.teacache_thresh} - if args.teacache_thresh is not None - else {} - ), - "use_ret_steps": args.use_ret_steps, - }, + "teacache": teacache_config,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/visual_gen/visual_gen_flux.py` around lines 218 - 226, The "teacache" dict construction is dense due to inline conditional unpacking; instead, create the dict incrementally by first building a base dict with "enable_teacache" and "use_ret_steps" and then, if args.teacache_thresh is not None, add "teacache_thresh" to that dict (referencing the same args.enable_teacache, args.teacache_thresh, args.use_ret_steps and the "teacache" dict key in the current code) so the intent is clearer and easier to read.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/visual_gen/pipeline.py`:
- Around line 175-184: Add the required NVIDIA copyright header and Apache 2.0
license block at the top of this file
(tensorrt_llm/_torch/visual_gen/pipeline.py) before any imports or code; keep
the existing logic (including the Pydantic v2 usage of
teacache_cfg.model_fields_set in the block around default_thresh) untouched, and
ensure the header text matches the project's standard NVIDIA + Apache-2.0
boilerplate used in other TensorRT-LLM source files.
---
Nitpick comments:
In `@examples/visual_gen/visual_gen_flux.py`:
- Around line 218-226: The "teacache" dict construction is dense due to inline
conditional unpacking; instead, create the dict incrementally by first building
a base dict with "enable_teacache" and "use_ret_steps" and then, if
args.teacache_thresh is not None, add "teacache_thresh" to that dict
(referencing the same args.enable_teacache, args.teacache_thresh,
args.use_ret_steps and the "teacache" dict key in the current code) so the
intent is clearer and easier to read.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0a2e10b1-6487-4007-bfb4-5d20bd9073c1
📒 Files selected for processing (4)
examples/visual_gen/serve/configs/flux1.ymlexamples/visual_gen/visual_gen_flux.pytensorrt_llm/_torch/visual_gen/models/flux/pipeline_flux.pytensorrt_llm/_torch/visual_gen/pipeline.py
zhenhuaw-me
left a comment
There was a problem hiding this comment.
Thanks for the fix! I was also thinking about changing similar configuration defaults to None and take per-model default in the pipeline.
|
PR_Github #38133 [ run ] triggered by Bot. Commit: |
|
PR_Github #38133 [ run ] completed with state |
…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
…efault t… (NVIDIA#12007) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
Summary by CodeRabbit
Description
Fix FLUX.1 TeaCache polynomial coefficients that were incorrectly copied from WAN I2V 480P config, and update the default threshold to match the official TeaCache repository.
[2.57e+05, ...]were WAN I2V 480P values instead of the official FLUX values[4.99e+02, ...](leading term 515x too large)Test Coverage
Validated on H200 — FLUX.1-dev, 10 prompts, 50 steps, seed 42
Visual comparison (Baseline → t=0.25 → t=0.4 → t=0.6 → t=0.8)
--teacache_threshstill overrides the defaultPR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.