-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix: resolve Claude Code token counting inefficiency and enable caching (#5104) #5108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ No security or compliance issues detected. Reviewed everything up to 98f3093. Security Overview
Detected Code Changes
Reply to this PR with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves token counting accuracy and adds prompt caching support for Claude Code models, along with comprehensive tests.
- Removes the 1.5× fudge factor and implements precise token counting using the Anthropic tokenizer.
- Enables
supportsPromptCache
in both the handler and type definitions for all Claude Code models. - Introduces new tests covering token counting accuracy and caching behavior.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/api/providers/claude-code.ts | Added precise countTokens override with lazy Tiktoken setup |
src/api/providers/tests/claude-code.spec.ts | Updated supportsPromptCache expectation to true |
src/api/providers/tests/claude-code-token-counting.spec.ts | New tests for accurate token counting |
src/api/providers/tests/claude-code-caching.spec.ts | New tests for cache token collection and reporting |
packages/types/src/providers/claude-code.ts | Enabled supportsPromptCache for all Claude Code models |
Comments suppressed due to low confidence (1)
src/api/providers/claude-code.ts:157
- The
Anthropic
type isn’t imported in this module, which will cause a TypeScript error. Please addimport type { Anthropic } from '@anthropic-ai/sdk'
(or the correct path) at the top.
override async countTokens(content: Anthropic.Messages.ContentBlockParam[]): Promise<number> {
I've addressed the review comments from @copilot:
All tests are passing and the changes maintain backward compatibility while fixing the token counting inefficiency issue. |
@hannesrudolph did we have countToken in the base provider ? we should use it it has worker or it will be block mainthread make it going to black (dead extension host) previous pr that address this issues |
I agree . Tiktoken is extremely inefficient. It has resulted in a lot of bugs before. |
8c59c65
to
bae3ac1
Compare
bae3ac1
to
6d19fac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is now just marking the Claude Code models as "supports caching" since the calculation of tokens seems to be fine. The tokens shown are coming directly from Claude Code.
LGTM
…ng (#5104) - Remove 1.5x fudge factor from Claude Code token counting - Enable prompt caching support for all Claude Code models - Add comprehensive tests for token counting and caching - Update existing tests to reflect accurate token counting This fixes the extreme token inefficiency where simple messages would jump from ~40k to over 60k tokens, causing API hangs when approaching the artificial 120k limit. Claude Code now properly utilizes its full 200k context window with accurate token counting.
- Extract IMAGE_TOKEN_ESTIMATE as a named constant for clarity - Update token counting tests to use exact counts instead of ranges for deterministic testing - Fix test expectations to match actual tokenizer output
- Removed custom countTokens override from claude-code.ts - Deleted claude-code-token-counting.spec.ts test file - Kept cache token collection and reporting functionality - Kept supportsPromptCache: true for all Claude Code models - Kept claude-code-caching.spec.ts tests This focuses the PR on enabling cache support without modifying token counting behavior.
6d19fac
to
98f3093
Compare
…ng (RooCodeInc#5104) (RooCodeInc#5108) * fix: resolve Claude Code token counting inefficiency and enable caching (RooCodeInc#5104) - Remove 1.5x fudge factor from Claude Code token counting - Enable prompt caching support for all Claude Code models - Add comprehensive tests for token counting and caching - Update existing tests to reflect accurate token counting This fixes the extreme token inefficiency where simple messages would jump from ~40k to over 60k tokens, causing API hangs when approaching the artificial 120k limit. Claude Code now properly utilizes its full 200k context window with accurate token counting. * fix: address PR review comments - Extract IMAGE_TOKEN_ESTIMATE as a named constant for clarity - Update token counting tests to use exact counts instead of ranges for deterministic testing - Fix test expectations to match actual tokenizer output * Remove token counting changes, keep only cache support - Removed custom countTokens override from claude-code.ts - Deleted claude-code-token-counting.spec.ts test file - Kept cache token collection and reporting functionality - Kept supportsPromptCache: true for all Claude Code models - Kept claude-code-caching.spec.ts tests This focuses the PR on enabling cache support without modifying token counting behavior. * fix: update webview test to expect supportsPromptCache=true for Claude Code models --------- Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>
…ng (#5104) (#5108) * fix: resolve Claude Code token counting inefficiency and enable caching (#5104) - Remove 1.5x fudge factor from Claude Code token counting - Enable prompt caching support for all Claude Code models - Add comprehensive tests for token counting and caching - Update existing tests to reflect accurate token counting This fixes the extreme token inefficiency where simple messages would jump from ~40k to over 60k tokens, causing API hangs when approaching the artificial 120k limit. Claude Code now properly utilizes its full 200k context window with accurate token counting. * fix: address PR review comments - Extract IMAGE_TOKEN_ESTIMATE as a named constant for clarity - Update token counting tests to use exact counts instead of ranges for deterministic testing - Fix test expectations to match actual tokenizer output * Remove token counting changes, keep only cache support - Removed custom countTokens override from claude-code.ts - Deleted claude-code-token-counting.spec.ts test file - Kept cache token collection and reporting functionality - Kept supportsPromptCache: true for all Claude Code models - Kept claude-code-caching.spec.ts tests This focuses the PR on enabling cache support without modifying token counting behavior. * fix: update webview test to expect supportsPromptCache=true for Claude Code models --------- Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>
Description
Fixes #5104

This PR resolves the extreme token counting inefficiency in the Claude Code Provider that was causing simple messages to jump from ~40k to over 60k tokens, leading to API hangs when approaching the artificial 120k limit.
Changes Made
Removed 1.5x fudge factor from Claude Code's token counting in
src/api/providers/claude-code.ts
Enabled prompt caching support for all Claude Code models in
packages/types/src/providers/claude-code.ts
supportsPromptCache: true
for all Claude Code modelscache_read_input_tokens
,cache_creation_input_tokens
)Added comprehensive tests for token counting and caching functionality
src/api/providers/__tests__/claude-code-token-counting.spec.ts
src/api/providers/__tests__/claude-code-caching.spec.ts
Testing
Verification of Acceptance Criteria
Checklist
Important
Fixes token counting inefficiency and enables caching for Claude Code models, ensuring accurate token counts and prompt caching support.
claude-code.ts
, resolving token inflation issues.claude-code.ts
by settingsupportsPromptCache: true
.claude-code-token-counting.spec.ts
andclaude-code-caching.spec.ts
for testing token counting and caching.claude-code.spec.ts
anduseSelectedModel.spec.ts
to reflect accurate token counting and caching.This description was created by
for 6d19fac. You can customize this summary. It will automatically update as commits are pushed.