Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jun 25, 2025

Description

Fixes #5104
image

image

This PR resolves the extreme token counting inefficiency in the Claude Code Provider that was causing simple messages to jump from ~40k to over 60k tokens, leading to API hangs when approaching the artificial 120k limit.

Changes Made

Testing

  • All existing tests pass
  • Added tests for accurate token counting without fudge factor
  • Added tests for cache token collection and reporting
  • Verified token accumulation across multiple messages
  • Tested handling of missing cache fields
  • Verified subscription usage reports zero cost

Verification of Acceptance Criteria

  • Token counts are now accurate and match what the Claude Code API actually uses
  • Claude Code can utilize its full 200k context window without artificial limitations
  • Prompt caching features are now available for Claude Code models
  • No more extreme token inflation (40k stays 40k, not 60k)
  • No more API hangs due to artificial token limits

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Comments added for complex logic
  • Documentation updated (code comments)
  • No breaking changes
  • All tests pass
  • Type checks pass
  • Linting passes

Important

Fixes token counting inefficiency and enables caching for Claude Code models, ensuring accurate token counts and prompt caching support.

  • Behavior:
    • Removed 1.5x fudge factor from token counting in claude-code.ts, resolving token inflation issues.
    • Enabled prompt caching for all Claude Code models in claude-code.ts by setting supportsPromptCache: true.
  • Testing:
    • Added claude-code-token-counting.spec.ts and claude-code-caching.spec.ts for testing token counting and caching.
    • Updated tests in claude-code.spec.ts and useSelectedModel.spec.ts to reflect accurate token counting and caching.
    • Verified handling of cache tokens, subscription usage, and API errors.
  • Verification:
    • Ensured token counts match actual usage, allowing full context window utilization without artificial limits.

This description was created by Ellipsis for 6d19fac. You can customize this summary. It will automatically update as commits are pushed.

@Copilot Copilot AI review requested due to automatic review settings June 25, 2025 06:14
@hannesrudolph hannesrudolph requested review from cte, jr and mrubens as code owners June 25, 2025 06:14
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working documentation Improvements or additions to documentation labels Jun 25, 2025
Copy link

delve-auditor bot commented Jun 25, 2025

No security or compliance issues detected. Reviewed everything up to 98f3093.

Security Overview
  • 🔎 Scanned files: 4 changed file(s)
Detected Code Changes
Change Type Relevant files
Bug Fix ► claude-code.ts
    Update supportsPromptCache flag to true for Claude Code models
► claude-code.spec.ts
    Update test expectations for prompt cache support
► useSelectedModel.spec.ts
    Update test for Claude Code prompt cache support
► claude-code-caching.spec.ts
    Add new tests for Claude Code caching functionality

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 25, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves token counting accuracy and adds prompt caching support for Claude Code models, along with comprehensive tests.

  • Removes the 1.5× fudge factor and implements precise token counting using the Anthropic tokenizer.
  • Enables supportsPromptCache in both the handler and type definitions for all Claude Code models.
  • Introduces new tests covering token counting accuracy and caching behavior.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/api/providers/claude-code.ts Added precise countTokens override with lazy Tiktoken setup
src/api/providers/tests/claude-code.spec.ts Updated supportsPromptCache expectation to true
src/api/providers/tests/claude-code-token-counting.spec.ts New tests for accurate token counting
src/api/providers/tests/claude-code-caching.spec.ts New tests for cache token collection and reporting
packages/types/src/providers/claude-code.ts Enabled supportsPromptCache for all Claude Code models
Comments suppressed due to low confidence (1)

src/api/providers/claude-code.ts:157

  • The Anthropic type isn’t imported in this module, which will cause a TypeScript error. Please add import type { Anthropic } from '@anthropic-ai/sdk' (or the correct path) at the top.
	override async countTokens(content: Anthropic.Messages.ContentBlockParam[]): Promise<number> {

src/api/providers/claude-code.ts Outdated Show resolved Hide resolved
@hannesrudolph
Copy link
Collaborator Author

I've addressed the review comments from @copilot:

  1. ✅ Extracted the magic number 300 into a named constant IMAGE_TOKEN_ESTIMATE with a comment explaining its purpose and consistency with the base tokenizer implementation.

  2. ✅ Updated the token counting tests to use exact counts instead of ranges, making them more deterministic and less prone to breaking if the tokenizer updates.

All tests are passing and the changes maintain backward compatibility while fixing the token counting inefficiency issue.

@samhvw8
Copy link

samhvw8 commented Jun 25, 2025

@hannesrudolph did we have countToken in the base provider ? we should use it it has worker or it will be block mainthread make it going to black (dead extension host)

previous pr that address this issues

#3037
#2848

@SannidhyaSah
Copy link
Collaborator

@hannesrudolph did we have countToken in the base provider ? we should use it it has worker or it will be block mainthread make it going to black (dead extension host)

previous pr that address this issues

#3037
#2848

I agree . Tiktoken is extremely inefficient. It has resulted in a lot of bugs before.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jun 25, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jun 25, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 25, 2025
@daniel-lxs daniel-lxs force-pushed the fix/issue-5104-claude-code-token-efficiency branch 2 times, most recently from 8c59c65 to bae3ac1 Compare June 25, 2025 22:54
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 25, 2025
@daniel-lxs daniel-lxs force-pushed the fix/issue-5104-claude-code-token-efficiency branch from bae3ac1 to 6d19fac Compare June 25, 2025 22:58
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Jun 25, 2025
Copy link
Collaborator

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is now just marking the Claude Code models as "supports caching" since the calculation of tokens seems to be fine. The tokens shown are coming directly from Claude Code.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 25, 2025
hannesrudolph and others added 4 commits June 25, 2025 19:31
…ng (#5104)

- Remove 1.5x fudge factor from Claude Code token counting
- Enable prompt caching support for all Claude Code models
- Add comprehensive tests for token counting and caching
- Update existing tests to reflect accurate token counting

This fixes the extreme token inefficiency where simple messages would
jump from ~40k to over 60k tokens, causing API hangs when approaching
the artificial 120k limit. Claude Code now properly utilizes its full
200k context window with accurate token counting.
- Extract IMAGE_TOKEN_ESTIMATE as a named constant for clarity
- Update token counting tests to use exact counts instead of ranges for deterministic testing
- Fix test expectations to match actual tokenizer output
- Removed custom countTokens override from claude-code.ts
- Deleted claude-code-token-counting.spec.ts test file
- Kept cache token collection and reporting functionality
- Kept supportsPromptCache: true for all Claude Code models
- Kept claude-code-caching.spec.ts tests

This focuses the PR on enabling cache support without modifying token counting behavior.
@daniel-lxs daniel-lxs force-pushed the fix/issue-5104-claude-code-token-efficiency branch from 6d19fac to 98f3093 Compare June 26, 2025 00:32
@mrubens mrubens merged commit f9f01b0 into main Jun 26, 2025
9 of 10 checks passed
@mrubens mrubens deleted the fix/issue-5104-claude-code-token-efficiency branch June 26, 2025 00:36
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 26, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Jun 26, 2025
Alorse pushed a commit to Alorse/Roo-Code that referenced this pull request Jun 27, 2025
…ng (RooCodeInc#5104) (RooCodeInc#5108)

* fix: resolve Claude Code token counting inefficiency and enable caching (RooCodeInc#5104)

- Remove 1.5x fudge factor from Claude Code token counting
- Enable prompt caching support for all Claude Code models
- Add comprehensive tests for token counting and caching
- Update existing tests to reflect accurate token counting

This fixes the extreme token inefficiency where simple messages would
jump from ~40k to over 60k tokens, causing API hangs when approaching
the artificial 120k limit. Claude Code now properly utilizes its full
200k context window with accurate token counting.

* fix: address PR review comments

- Extract IMAGE_TOKEN_ESTIMATE as a named constant for clarity
- Update token counting tests to use exact counts instead of ranges for deterministic testing
- Fix test expectations to match actual tokenizer output

* Remove token counting changes, keep only cache support

- Removed custom countTokens override from claude-code.ts
- Deleted claude-code-token-counting.spec.ts test file
- Kept cache token collection and reporting functionality
- Kept supportsPromptCache: true for all Claude Code models
- Kept claude-code-caching.spec.ts tests

This focuses the PR on enabling cache support without modifying token counting behavior.

* fix: update webview test to expect supportsPromptCache=true for Claude Code models

---------

Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>
hannesrudolph added a commit that referenced this pull request Jul 3, 2025
…ng (#5104) (#5108)

* fix: resolve Claude Code token counting inefficiency and enable caching (#5104)

- Remove 1.5x fudge factor from Claude Code token counting
- Enable prompt caching support for all Claude Code models
- Add comprehensive tests for token counting and caching
- Update existing tests to reflect accurate token counting

This fixes the extreme token inefficiency where simple messages would
jump from ~40k to over 60k tokens, causing API hangs when approaching
the artificial 120k limit. Claude Code now properly utilizes its full
200k context window with accurate token counting.

* fix: address PR review comments

- Extract IMAGE_TOKEN_ESTIMATE as a named constant for clarity
- Update token counting tests to use exact counts instead of ranges for deterministic testing
- Fix test expectations to match actual tokenizer output

* Remove token counting changes, keep only cache support

- Removed custom countTokens override from claude-code.ts
- Deleted claude-code-token-counting.spec.ts test file
- Kept cache token collection and reporting functionality
- Kept supportsPromptCache: true for all Claude Code models
- Kept claude-code-caching.spec.ts tests

This focuses the PR on enabling cache support without modifying token counting behavior.

* fix: update webview test to expect supportsPromptCache=true for Claude Code models

---------

Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation lgtm This PR has been approved by a maintainer PR - Needs Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Bug: Claude Code Provider shows extreme token inefficiency and hangs

5 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.