Add attention benchmarking tools #26835

MatthewBonanni · Oct 14, 2025

Purpose

Add tools for benchmarking attention backends. These can be used to perform parameter tuning as well as selecting optimal backends for particular configurations. These tools were built with heavy use of Claude Code.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

…into benchmark_attention Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

This commit fixes the attention benchmark to properly support both decode and prefill pipelines for MLA backends after the recent refactor. Key changes: - Added MockKVBProj class to mock KV projection layer for prefill mode - Created _create_input_tensors() to generate both decode and prefill inputs - Decode: uses kv_lora_rank (512) dimension - Prefill: uses qk_nope_head_dim (128) to stay under FlashAttention's 256 limit - Added automatic mode selection: calls _forward_decode() or _forward_prefill() based on metadata.decode/metadata.prefill - Fixed threshold setting: changed from class to instance variable - Added traceback printing for better error debugging The benchmark now successfully compares decode vs prefill pipelines: qlen=2: decode=0.000033s, prefill=0.000303s -> decode is 9.09x faster 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a comprehensive attention benchmarking suite, which is a valuable addition for performance tuning and backend selection. The implementation is well-structured, with clear separation of concerns for parsing batch specifications, running benchmarks, and formatting results. However, I've identified several critical issues related to the batch specification parser and its tests. The parser implementation is inconsistent with the test suite and some default configurations, which will lead to runtime errors and test failures. Specifically, the tests use an outdated grammar (with spec and chunk prefixes) that the parser doesn't support, and some default arguments are invalid. Additionally, helper functions for analyzing batch statistics will crash due to referencing non-existent attributes. Addressing these inconsistencies is crucial to make the new benchmarking tools functional and reliable.

benchmarks/attention_benchmarks/batch_spec.py

benchmarks/attention_benchmarks/benchmark.py

benchmarks/attention_benchmarks/test_batch_spec.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

benchmarks/attention_benchmarks/mla_runner.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni and others added 18 commits October 10, 2025 20:01

initial commit of benchmarks

7391efa

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

don't unnecessarily reinitialize

d251dd2

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

clean up grammar

7ebebc5

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

simplify

8a33441

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

add batch spec ranges

10370a2

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

rename

3e158e9

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

disambiguate grammar

caec398

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

use metadata builders

f4f8ef3

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

bugfixes

2e461b9

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

fix typo

fe368e1

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

fix dtype

3103ac4

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

refactor

8864bce

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge branch 'benchmark_attention' of github.com:MatthewBonanni/vllm …

7d60e4a

…into benchmark_attention Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

turn off auto

e4a5ebe

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

abbreviate column titles

afe74cc

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

refactor

7f3c3f3

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

update configurations

268edad

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added the performance Performance-related issues label Oct 14, 2025

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 14, 2025

View reviewed changes

benchmarks/attention_benchmarks/mla_runner.py Outdated Show resolved Hide resolved

MatthewBonanni added 5 commits October 14, 2025 15:40

fix tests

0acf127

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

update old batch specs

72a9a83

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

bugfix mla dims

6da4499

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

add plotting script

f9a9748

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

visualize some potential heuristics

6397cc0

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni mentioned this pull request Oct 14, 2025

[Attention] Tune CUTLASS MLA num_splits #26846

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add attention benchmarking tools #26835

Add attention benchmarking tools #26835

MatthewBonanni commented Oct 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

Uh oh!

Add attention benchmarking tools #26835

Are you sure you want to change the base?

Add attention benchmarking tools #26835

Conversation

MatthewBonanni commented Oct 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MatthewBonanni commented Oct 14, 2025 •

edited by github-actions bot

Loading