Missing stream_options for streaming usage metadata in LiteLLM

Please make sure you read the contribution guide and file the issues in the right place.
Contribution guide.

Describe the bug

When using LiteLlm with streaming enabled (stream=True), usage metadata (token counts) is not returned because the required stream_options parameter is missing from the LiteLLM completion call. According to https://docs.litellm.ai/docs/completion/usage#streaming-usage, streaming requests must include stream_options={"include_usage": True} to receive usage metadata chunks.

The code in lite_llm.py:828 sets completion_args["stream"] = True and expects to receive UsageMetadataChunk on line 865, but this chunk is never sent by LiteLLM without
the stream_options parameter.

To Reproduce

Minimal code to reproduce:

  import os
from typing import Optional

from google.adk import Agent, Runner
from google.adk.agents import RunConfig
from google.adk.agents.callback_context import CallbackContext
from google.adk.agents.run_config import StreamingMode
from google.adk.models import LlmRequest, LlmResponse
from google.adk.models.lite_llm import LiteLlm
from google.adk.sessions import InMemorySessionService
from google.genai import types

os.environ['VERTEXAI_PROJECT'] = '<your project id>'
os.environ['VERTEXAI_LOCATION'] = 'europe-west1'
import vertexai
vertexai.init(
    project=os.getenv("VERTEXAI_PROJECT"),
    location=os.getenv("VERTEXAI_LOCATION"),
)

def enable_streaming(callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:
    callback_context._invocation_context.run_config=RunConfig(
        streaming_mode=StreamingMode.SSE
    )
root_agent = Agent(
    model=LiteLlm(model="vertex_ai/claude-3-5-haiku@20241022"),
    name="test_agent",
    before_model_callback=enable_streaming
)

async def streaming(query: str):
    session_service = InMemorySessionService()
    user_id = 'user_id'
    app_name = 'app_name'
    session = await session_service.create_session(app_name=app_name, user_id=user_id)
    runner = Runner(agent=root_agent,
                    app_name=app_name,
                    session_service=session_service
    )
    content = types.Content(role='user', parts=[types.Part(text=query)])
    async for chunk in runner.run_async(user_id=user_id, session_id=session.id, new_message=content):
        response = chunk

        # Check usage metadata - will be None without the fix
        if response and response.usage_metadata:
            print(f"Tokens used: {response.usage_metadata.total_token_count}")
        else:
            print("No usage metadata available")  # This is what happens

import asyncio
if __name__ == "__main__":
    asyncio.run(streaming("Hello, how are you?"))

Steps to reproduce:

Install ADK: pip install google-adk
Run the code above with streaming enabled
Observe that usage_metadata is None in the final response
Add debug logging to see that UsageMetadataChunk is never received from LiteLLM

Expected behavior

The streaming response should include usage metadata with token counts (prompt_tokens, completion_tokens, total_tokens) in the final aggregated response. The UsageMetadataChunk should be received and processed as the code expects on lines 865-870 of lite_llm.py.

Screenshots

N/A

Desktop (please complete the following information):

OS: macOS
Python version: Python 3.12
ADK version: 1.14.1

Model Information:

Are you using LiteLLM: Yes
Which model is being used: Verified across multiple models (GPT-4, Claude (claude-sonnet-4, claude-sonnet-4.5), etc.)

Additional context

Root cause location: lite_llm.py:828

Current code:

  if stream:
      completion_args["stream"] = True  # Missing stream_options

Proposed fix:

  if stream:
      completion_args["stream"] = True
      completion_args["stream_options"] = {"include_usage": True}

This issue has been verified in debug mode where adding stream_options={"include_usage": True} successfully enables the UsageMetadataChunk to be received and processed by the existing handler logic (lines 865-870).

The fix is a simple two-line addition that enables proper usage tracking for all streaming requests through LiteLLM, which is essential for token counting, billing monitoring, and usage analytics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing `stream_options` for streaming usage metadata in LiteLLM #3181

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Missing stream_options for streaming usage metadata in LiteLLM #3181

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Missing `stream_options` for streaming usage metadata in LiteLLM #3181