-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Please make sure you read the contribution guide and file the issues in the right place.
Contribution guide.
Describe the bug
When using LiteLlm with streaming enabled (stream=True)
, usage metadata (token counts) is not returned because the required stream_options
parameter is missing from the LiteLLM completion call. According to https://docs.litellm.ai/docs/completion/usage#streaming-usage, streaming requests must include stream_options={"include_usage": True}
to receive usage metadata chunks.
The code in lite_llm.py:828 sets completion_args["stream"] = True
and expects to receive UsageMetadataChunk
on line 865, but this chunk is never sent by LiteLLM without
the stream_options
parameter.
To Reproduce
Minimal code to reproduce:
import os
from typing import Optional
from google.adk import Agent, Runner
from google.adk.agents import RunConfig
from google.adk.agents.callback_context import CallbackContext
from google.adk.agents.run_config import StreamingMode
from google.adk.models import LlmRequest, LlmResponse
from google.adk.models.lite_llm import LiteLlm
from google.adk.sessions import InMemorySessionService
from google.genai import types
os.environ['VERTEXAI_PROJECT'] = '<your project id>'
os.environ['VERTEXAI_LOCATION'] = 'europe-west1'
import vertexai
vertexai.init(
project=os.getenv("VERTEXAI_PROJECT"),
location=os.getenv("VERTEXAI_LOCATION"),
)
def enable_streaming(callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:
callback_context._invocation_context.run_config=RunConfig(
streaming_mode=StreamingMode.SSE
)
root_agent = Agent(
model=LiteLlm(model="vertex_ai/claude-3-5-haiku@20241022"),
name="test_agent",
before_model_callback=enable_streaming
)
async def streaming(query: str):
session_service = InMemorySessionService()
user_id = 'user_id'
app_name = 'app_name'
session = await session_service.create_session(app_name=app_name, user_id=user_id)
runner = Runner(agent=root_agent,
app_name=app_name,
session_service=session_service
)
content = types.Content(role='user', parts=[types.Part(text=query)])
async for chunk in runner.run_async(user_id=user_id, session_id=session.id, new_message=content):
response = chunk
# Check usage metadata - will be None without the fix
if response and response.usage_metadata:
print(f"Tokens used: {response.usage_metadata.total_token_count}")
else:
print("No usage metadata available") # This is what happens
import asyncio
if __name__ == "__main__":
asyncio.run(streaming("Hello, how are you?"))
Steps to reproduce:
- Install ADK: pip install google-adk
- Run the code above with streaming enabled
- Observe that usage_metadata is None in the final response
- Add debug logging to see that UsageMetadataChunk is never received from LiteLLM
Expected behavior
The streaming response should include usage metadata with token counts (prompt_tokens, completion_tokens, total_tokens) in the final aggregated response. The UsageMetadataChunk
should be received and processed as the code expects on lines 865-870 of lite_llm.py.
Screenshots
N/A
Desktop (please complete the following information):
- OS: macOS
- Python version: Python 3.12
- ADK version: 1.14.1
Model Information:
- Are you using LiteLLM: Yes
- Which model is being used: Verified across multiple models (GPT-4, Claude (claude-sonnet-4, claude-sonnet-4.5), etc.)
Additional context
Root cause location: lite_llm.py:828
Current code:
if stream:
completion_args["stream"] = True # Missing stream_options
Proposed fix:
if stream:
completion_args["stream"] = True
completion_args["stream_options"] = {"include_usage": True}
This issue has been verified in debug mode where adding stream_options={"include_usage": True} successfully enables the UsageMetadataChunk to be received and processed by the existing handler logic (lines 865-870).
The fix is a simple two-line addition that enables proper usage tracking for all streaming requests through LiteLLM, which is essential for token counting, billing monitoring, and usage analytics.