[https://nvbugs/5987470][fix] BREAKING: Do not normalize log probs by default by achartier · Pull Request #12366 · NVIDIA/TensorRT-LLM

achartier · Mar 19, 2026

Expose the option in ModelRunnerCpp

Summary by CodeRabbit

New Features
- Added parameter to enable customization of log probability normalization behavior, allowing users to control how probability values are formatted during inference.
Changes
- Updated default configuration to disable log probability normalization, changing baseline behavior for probability output formatting in the executor without requiring explicit parameter specification.

Description

Log probs are only useful when not normalized for greedy sampling which is the most common use case. Also the pytorch backend does not normalize log probs, so this is aligning behaviors between the two backends.

Test Coverage

Existing log probs tests.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Expose the option in ModelRunnerCpp Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

achartier · Mar 19, 2026

@CodeRabbit review

coderabbitai · Mar 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · Mar 19, 2026

📝 Walkthrough

Walkthrough

The default value for the normalizeLogProbs parameter is changed from true to false across the C++ header, Python binding, and Python API. This modifies the default log probability normalization behavior when callers omit the argument.

Changes

Cohort / File(s)	Summary
Default Parameter Value Updates `cpp/include/tensorrt_llm/executor/executor.h`, `cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`	Changed default value of `normalizeLogProbs`/`normalize_log_probs` parameter from `true` to `false` in `ExecutorConfig` constructor across C++ header and Python binding.
Python API Enhancement `tensorrt_llm/runtime/model_runner_cpp.py`	Added new `normalize_log_probs: bool = False` parameter to `ModelRunnerCpp.from_dir()` and forwards it to `ExecutorConfig`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description provides the rationale for changes and mentions test coverage, but lacks a clear summary of what was actually changed across the three modified files.	Add a more detailed description summarizing the changes made: updating default values in ExecutorConfig (C++ header and Python binding) and adding a new parameter to ModelRunnerCpp.from_dir(). Clarify which changes address which aspect of the PR objective.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change: setting the default value of normalizeLogProbs to false across C++, nanobind, and Python layers. It clearly indicates this is a breaking change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.

Add a pylint configuration file to your project to customize how CodeRabbit runs pylint.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)
1-3: ⚠️ Potential issue | 🟠 Major

Update the file copyright year to include 2026.

This file is modified in this PR, but the SPDX copyright header still ends at 2025.
Suggested fix
- * SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp` around lines 1 - 3,
Update the SPDX copyright header at the top of the modified file by extending
the year range to include 2026: locate the existing header lines starting with
"SPDX-FileCopyrightText:" and "SPDX-License-Identifier:" in executorConfig.cpp
and change the year range "2022-2025" to "2022-2026" so the file reflects the
updated modification year.

🧹 Nitpick comments (1)

tensorrt_llm/runtime/model_runner_cpp.py (1)

129-205: Document the new normalize_log_probs argument in from_dir docstring.

The new public parameter is added to the signature but not described in Args, which makes runtime behavior changes easier to miss.

Suggested doc update

         fail_fast_on_attention_window_too_large (bool):
             Whether to fail fast if the attention window(s) are too large to fit even a single sequence in the KVCache.
+        normalize_log_probs (bool):
+            Whether returned log probabilities should be normalized. Defaults to False.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/runtime/model_runner_cpp.py` around lines 129 - 205, The
docstring for ModelRunnerCpp.from_dir is missing documentation for the new
normalize_log_probs parameter; add an Args entry describing normalize_log_probs
(type bool), its default value (False), and what turning it on/off does to
output (e.g., whether logits are converted to normalized log probabilities
before downstream processing/generation), placing it with the other parameter
descriptions in the from_dir docstring and referencing the normalize_log_probs
parameter name exactly so users see its runtime effect.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`:
- Around line 1-3: Update the SPDX copyright header at the top of the modified
file by extending the year range to include 2026: locate the existing header
lines starting with "SPDX-FileCopyrightText:" and "SPDX-License-Identifier:" in
executorConfig.cpp and change the year range "2022-2025" to "2022-2026" so the
file reflects the updated modification year.

---

Nitpick comments:
In `@tensorrt_llm/runtime/model_runner_cpp.py`:
- Around line 129-205: The docstring for ModelRunnerCpp.from_dir is missing
documentation for the new normalize_log_probs parameter; add an Args entry
describing normalize_log_probs (type bool), its default value (False), and what
turning it on/off does to output (e.g., whether logits are converted to
normalized log probabilities before downstream processing/generation), placing
it with the other parameter descriptions in the from_dir docstring and
referencing the normalize_log_probs parameter name exactly so users see its
runtime effect.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d3850d7d-6212-4cf9-a5d4-a72ab26951b9

📥 Commits

Reviewing files that changed from the base of the PR and between e37493a and 5a32ec9.

📒 Files selected for processing (3)

cpp/include/tensorrt_llm/executor/executor.h
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
tensorrt_llm/runtime/model_runner_cpp.py

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

achartier · Mar 19, 2026

/bot run --disable-fail-fast

tensorrt-cicd · Mar 19, 2026

PR_Github #39646 [ run ] triggered by Bot. Commit: ac66157 Link to invocation

tensorrt-cicd · Mar 20, 2026

PR_Github #39646 [ run ] completed with state SUCCESS. Commit: ac66157
/LLM/main/L0_MergeRequest_PR pipeline #30851 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

pcastonguay · Mar 20, 2026

Can you rename PR to [https://nvbugs/5987470][fix] BREAKING: Do not normalize log probs by default since this is a breaking change.

pcastonguay · Mar 20, 2026

@achartier so just to be clear, this shouldn't impact LLM API behavior correct?

pcastonguay · Mar 20, 2026

/bot run --disable-fail-fast

tensorrt-cicd · Mar 20, 2026

PR_Github #39741 [ run ] triggered by Bot. Commit: ac66157 Link to invocation

achartier · Mar 20, 2026

@achartier so just to be clear, this shouldn't impact LLM API behavior correct?

Yes, normalize_log_probs is already false by default when using LLM API:

TensorRT-LLM/tensorrt_llm/llmapi/llm_args.py

Lines 2892 to 2893 in 9db8487

    
           normalize_log_probs: bool = Field( 
        
               default=False, description="Normalize log probabilities.")

pcastonguay · Mar 20, 2026

@Funatiq @schetlur-nv @MartinMarciniszyn are you aware of customers using the C++ runtime that could be impacted by this change?

tensorrt-cicd · Mar 20, 2026

PR_Github #39741 [ run ] completed with state SUCCESS. Commit: ac66157
/LLM/main/L0_MergeRequest_PR pipeline #30936 completed with status: 'SUCCESS'

CI Report

Link to invocation

ixlmar

A few thoughts:

~~TrtLlmArgs.normalize_log_probs already defaults to false. How does this relate to the current PR?~~ Already answered here.
Normalization appears to only be respected by top-K (and top-P-after-top-K) sampling. but not by the top-P sampling. Is that indeed the case? If yes, then not normalizing is indeed the preferable default behavior.
I suppose that the logprobs returned by the TRT backend account for the temperature? If yes, then this is different from the default behavior of the PyTorch backend. Perhaps this should be documented somewhere.

achartier · Mar 23, 2026

Yes, that's correct. Only the top-K code path has normalizeLogProbs

The TRT backend logprobs do account for temperature indeed. Would the decodingParams.h header description of outputLogProbs be a good spot to document it (in a separate PR)?

Do not normalize log probs by default

5a32ec9

Expose the option in ModelRunnerCpp Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

achartier requested review from Funatiq and pcastonguay March 19, 2026 21:56

github-actions Bot assigned achartier Mar 19, 2026

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

update unit tests

24376c0

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

achartier requested a review from a team as a code owner March 19, 2026 22:06

achartier requested a review from SimengLiu-nv March 19, 2026 22:06

update c++ triton backend

ac66157

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

achartier changed the title ~~[https://nvbugs/5987470][fix] Do not normalize log probs by default~~ [https://nvbugs/5987470][fix] BREAKING: Do not normalize log probs by default Mar 20, 2026

ixlmar reviewed Mar 23, 2026

View reviewed changes

pcastonguay approved these changes Mar 23, 2026

View reviewed changes

achartier merged commit 73a02ee into NVIDIA:main Mar 24, 2026
7 checks passed

achartier deleted the norm branch March 24, 2026 17:24

Search code, repositories, users, issues, pull requests...

Conversation

achartier commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

achartier commented Mar 19, 2026

Uh oh!

coderabbitai Bot commented Mar 19, 2026

Uh oh!

coderabbitai Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

achartier commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 19, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

pcastonguay commented Mar 20, 2026

Uh oh!

pcastonguay commented Mar 20, 2026

Uh oh!

pcastonguay commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

achartier commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcastonguay commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

ixlmar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achartier commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

achartier commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

achartier commented Mar 20, 2026 •

edited

Loading

ixlmar left a comment •

edited

Loading