[https://nvbugs/5987470][fix] BREAKING: Do not normalize log probs by default#12366
[https://nvbugs/5987470][fix] BREAKING: Do not normalize log probs by default#12366achartier merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
Conversation
Expose the option in ModelRunnerCpp Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThe default value for the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.Add a pylint configuration file to your project to customize how CodeRabbit runs |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)
1-3:⚠️ Potential issue | 🟠 MajorUpdate the file copyright year to include 2026.
This file is modified in this PR, but the SPDX copyright header still ends at 2025.
Suggested fix
- * SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp` around lines 1 - 3, Update the SPDX copyright header at the top of the modified file by extending the year range to include 2026: locate the existing header lines starting with "SPDX-FileCopyrightText:" and "SPDX-License-Identifier:" in executorConfig.cpp and change the year range "2022-2025" to "2022-2026" so the file reflects the updated modification year.
🧹 Nitpick comments (1)
tensorrt_llm/runtime/model_runner_cpp.py (1)
129-205: Document the newnormalize_log_probsargument infrom_dirdocstring.The new public parameter is added to the signature but not described in
Args, which makes runtime behavior changes easier to miss.Suggested doc update
fail_fast_on_attention_window_too_large (bool): Whether to fail fast if the attention window(s) are too large to fit even a single sequence in the KVCache. + normalize_log_probs (bool): + Whether returned log probabilities should be normalized. Defaults to False.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/runtime/model_runner_cpp.py` around lines 129 - 205, The docstring for ModelRunnerCpp.from_dir is missing documentation for the new normalize_log_probs parameter; add an Args entry describing normalize_log_probs (type bool), its default value (False), and what turning it on/off does to output (e.g., whether logits are converted to normalized log probabilities before downstream processing/generation), placing it with the other parameter descriptions in the from_dir docstring and referencing the normalize_log_probs parameter name exactly so users see its runtime effect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`:
- Around line 1-3: Update the SPDX copyright header at the top of the modified
file by extending the year range to include 2026: locate the existing header
lines starting with "SPDX-FileCopyrightText:" and "SPDX-License-Identifier:" in
executorConfig.cpp and change the year range "2022-2025" to "2022-2026" so the
file reflects the updated modification year.
---
Nitpick comments:
In `@tensorrt_llm/runtime/model_runner_cpp.py`:
- Around line 129-205: The docstring for ModelRunnerCpp.from_dir is missing
documentation for the new normalize_log_probs parameter; add an Args entry
describing normalize_log_probs (type bool), its default value (False), and what
turning it on/off does to output (e.g., whether logits are converted to
normalized log probabilities before downstream processing/generation), placing
it with the other parameter descriptions in the from_dir docstring and
referencing the normalize_log_probs parameter name exactly so users see its
runtime effect.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d3850d7d-6212-4cf9-a5d4-a72ab26951b9
📒 Files selected for processing (3)
cpp/include/tensorrt_llm/executor/executor.hcpp/tensorrt_llm/nanobind/executor/executorConfig.cpptensorrt_llm/runtime/model_runner_cpp.py
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #39646 [ run ] triggered by Bot. Commit: |
|
PR_Github #39646 [ run ] completed with state
|
|
Can you rename PR to |
|
@achartier so just to be clear, this shouldn't impact LLM API behavior correct? |
|
/bot run --disable-fail-fast |
|
PR_Github #39741 [ run ] triggered by Bot. Commit: |
Yes, TensorRT-LLM/tensorrt_llm/llmapi/llm_args.py Lines 2892 to 2893 in 9db8487 |
|
@Funatiq @schetlur-nv @MartinMarciniszyn are you aware of customers using the C++ runtime that could be impacted by this change? |
|
PR_Github #39741 [ run ] completed with state |
There was a problem hiding this comment.
A few thoughts:
-
Already answered here.TrtLlmArgs.normalize_log_probsalready defaults to false. How does this relate to the current PR? -
Normalization appears to only be respected by top-K (and top-P-after-top-K) sampling. but not by the top-P sampling. Is that indeed the case? If yes, then not normalizing is indeed the preferable default behavior.
-
I suppose that the logprobs returned by the TRT backend account for the temperature? If yes, then this is different from the default behavior of the PyTorch backend. Perhaps this should be documented somewhere.
|
Yes, that's correct. Only the top-K code path has The TRT backend logprobs do account for temperature indeed. Would the |
Expose the option in ModelRunnerCpp
Summary by CodeRabbit
New Features
Changes
Description
Log probs are only useful when not normalized for greedy sampling which is the most common use case. Also the pytorch backend does not normalize log probs, so this is aligning behaviors between the two backends.
Test Coverage
Existing log probs tests.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.