Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Missing stream_options for streaming usage metadata in LiteLLM #3181

Copy link
Copy link
@TaurusLAK

Description

@TaurusLAK
Issue body actions

Please make sure you read the contribution guide and file the issues in the right place.
Contribution guide.

Describe the bug

When using LiteLlm with streaming enabled (stream=True), usage metadata (token counts) is not returned because the required stream_options parameter is missing from the LiteLLM completion call. According to https://docs.litellm.ai/docs/completion/usage#streaming-usage, streaming requests must include stream_options={"include_usage": True} to receive usage metadata chunks.

The code in lite_llm.py:828 sets completion_args["stream"] = True and expects to receive UsageMetadataChunk on line 865, but this chunk is never sent by LiteLLM without
the stream_options parameter.

To Reproduce

Minimal code to reproduce:

  import os
from typing import Optional

from google.adk import Agent, Runner
from google.adk.agents import RunConfig
from google.adk.agents.callback_context import CallbackContext
from google.adk.agents.run_config import StreamingMode
from google.adk.models import LlmRequest, LlmResponse
from google.adk.models.lite_llm import LiteLlm
from google.adk.sessions import InMemorySessionService
from google.genai import types

os.environ['VERTEXAI_PROJECT'] = '<your project id>'
os.environ['VERTEXAI_LOCATION'] = 'europe-west1'
import vertexai
vertexai.init(
    project=os.getenv("VERTEXAI_PROJECT"),
    location=os.getenv("VERTEXAI_LOCATION"),
)

def enable_streaming(callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:
    callback_context._invocation_context.run_config=RunConfig(
        streaming_mode=StreamingMode.SSE
    )
root_agent = Agent(
    model=LiteLlm(model="vertex_ai/claude-3-5-haiku@20241022"),
    name="test_agent",
    before_model_callback=enable_streaming
)

async def streaming(query: str):
    session_service = InMemorySessionService()
    user_id = 'user_id'
    app_name = 'app_name'
    session = await session_service.create_session(app_name=app_name, user_id=user_id)
    runner = Runner(agent=root_agent,
                    app_name=app_name,
                    session_service=session_service
    )
    content = types.Content(role='user', parts=[types.Part(text=query)])
    async for chunk in runner.run_async(user_id=user_id, session_id=session.id, new_message=content):
        response = chunk

        # Check usage metadata - will be None without the fix
        if response and response.usage_metadata:
            print(f"Tokens used: {response.usage_metadata.total_token_count}")
        else:
            print("No usage metadata available")  # This is what happens

import asyncio
if __name__ == "__main__":
    asyncio.run(streaming("Hello, how are you?"))

Steps to reproduce:

  1. Install ADK: pip install google-adk
  2. Run the code above with streaming enabled
  3. Observe that usage_metadata is None in the final response
  4. Add debug logging to see that UsageMetadataChunk is never received from LiteLLM

Expected behavior

The streaming response should include usage metadata with token counts (prompt_tokens, completion_tokens, total_tokens) in the final aggregated response. The UsageMetadataChunk should be received and processed as the code expects on lines 865-870 of lite_llm.py.

Screenshots

N/A

Desktop (please complete the following information):

  • OS: macOS
  • Python version: Python 3.12
  • ADK version: 1.14.1

Model Information:

  • Are you using LiteLLM: Yes
  • Which model is being used: Verified across multiple models (GPT-4, Claude (claude-sonnet-4, claude-sonnet-4.5), etc.)

Additional context

Root cause location: lite_llm.py:828

Current code:

  if stream:
      completion_args["stream"] = True  # Missing stream_options

Proposed fix:

  if stream:
      completion_args["stream"] = True
      completion_args["stream_options"] = {"include_usage": True}

This issue has been verified in debug mode where adding stream_options={"include_usage": True} successfully enables the UsageMetadataChunk to be received and processed by the existing handler logic (lines 865-870).

The fix is a simple two-line addition that enables proper usage tracking for all streaming requests through LiteLLM, which is essential for token counting, billing monitoring, and usage analytics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    models[Component] Issues related to model support[Component] Issues related to model support

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.