Tracking interruption point during large language model output streaming using FastAPI StreamingResponse #13707

May 12, 2025

kingmming
May 12, 2025

First Check

I added a very descriptive title here.
I used the GitHub search to find a similar question and didn't find it.
I searched the FastAPI documentation, with the integrated search.
I already searched in Google "How to X in FastAPI" and didn't find any information.
I already read and followed all the tutorial in the docs and didn't find an answer.
I already checked if it is not related to FastAPI but to Pydantic.
I already checked if it is not related to FastAPI but to Swagger UI.
I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

I commit to help with one of those options 👆

Example Code

from fastapi import FastAPI, Request
from starlette.responses import StreamingResponse
import asyncio

app = FastAPI()

@app.get("/stream")
async def stream(request: Request):
    async def event_generator():
        try:
            for i in range(100):  # Simulating a long output from a large model
                yield f"data: Line {i}\n\n"
        except Exception as e:
            print(f"Error occurred: {e}")

    return StreamingResponse(event_generator(), media_type="text/event-stream")

Description

I'm using FastAPI's StreamingResponse to stream the output of a large language model to the client. I need to track the exact point at which the user interrupts the streaming process. This is crucial for logging and monitoring purposes, as it helps me understand where users typically stop the output and potentially optimize the model's behavior or user experience.

Scenario

I have a FastAPI application that streams the output of a large language model to the client in real-time.
The output can be quite long, and users may want to interrupt the streaming at any point.
I want to record the exact position (e.g., the line number or token index) where the user interrupted the streaming.

Operating System

Linux

Operating System Details

No response

FastAPI Version

0.115.7

Pydantic Version

2.10.6

Python Version

Python 3.10.14

Additional Context

No response

Answered by YuriiMotov

May 12, 2025

Not sure it's the easiest way, but the following code works:

import asyncio

from fastapi import FastAPI, Request
from starlette.responses import StreamingResponse

app = FastAPI()

@app.get("/stream")
async def stream(request: Request):
    async def event_generator(state: dict[str, int | str]):
        try:
            for i in range(20):  # Simulating a long output from a large model
                state["step"] = i
                await asyncio.sleep(0.5)
                yield f"data: Line {i}\n\n"
        except Exception as e:
            print(f"Error occurred: {e}")
        state["finished"] = True

    state = {"step": 0}  # Shared state object

    async def watch_disconnect(re…

View full answer

kingmming · May 12, 2025

YuriiMotov
May 12, 2025
Collaborator

Not sure it's the easiest way, but the following code works:

import asyncio

from fastapi import FastAPI, Request
from starlette.responses import StreamingResponse

app = FastAPI()

@app.get("/stream")
async def stream(request: Request):
    async def event_generator(state: dict[str, int | str]):
        try:
            for i in range(20):  # Simulating a long output from a large model
                state["step"] = i
                await asyncio.sleep(0.5)
                yield f"data: Line {i}\n\n"
        except Exception as e:
            print(f"Error occurred: {e}")
        state["finished"] = True

    state = {"step": 0}  # Shared state object

    async def watch_disconnect(request: Request):
        while True:
            if await request.is_disconnected():
                await asyncio.sleep(0.1)
                is_finished = state.get("finished", False)
                if not is_finished:
                    print(f"Client disconnected at step #{state['step']}")
                break

    asyncio.create_task(watch_disconnect(request))

    return StreamingResponse(event_generator(state), media_type="text/event-stream")

Here, before returning StreamingResponse we run watcher task in the background and pass state objects to it.
We also pass the same state object to generator function.

When client disconnects, StreamingResponse will stop iterating through generator function, but we will have it's last state stored in shared state object.
watch_disconnect function handles client disconnect and reports if client disconnected before finishing sending the response.

8 replies

kingmming May 15, 2025
Author

Hi! @YuriiMotov I still need to capture the network error disconnection when the client and server are streaming output. How can I distinguish it in your code?

YuriiMotov May 15, 2025
Collaborator

As far as I understand, there is no way for ASGI app to distinguish client intentional disconnection and disconnection due to the network error.
https://asgi.readthedocs.io/en/latest/specs/www.html#disconnect-receive-event

Possible solutions are:

Use websocket connection instead of HTTP StreamingResponse. This way client can send message before closing connection and you can handle it.
Handle client intentional disconnect on frontend side and then send this info to backend (another request to another endpoint).

kingmming May 16, 2025
Author

Hi! @YuriiMotov Thank you very much for your detailed explanation and suggestions! I really appreciate your insights and the possible solutions you provided. They are very helpful.

YuriiMotov Jun 12, 2025
Collaborator

You would better open a new question and provide code example. This way you will have more chances to get answer

kingmming Jun 12, 2025
Author

Hi! @YuriiMotov
StreamingResponse intermittently throws RuntimeError: No response returned and socket.send() raised exception after prolonged operation

Description

I am encountering intermittent errors in our FastAPI application that utilizes StreamingResponse to provide real-time chat to clients. After running for an extended period (several days), the application starts to log the following errors:

2025-06-12 11:37:16,677 - asyncio - WARNING - socket.send() raised exception.
2025-06-12 11:37:16,677 - test_app - ERROR - 139709795384192 - Unhandled Exception: No response returned.

These errors suggest that the StreamingResponse is not returning a valid HTTP response in certain scenarios, or the client disconnects during the response transmission.

Code Snippet

import asyncio

from fastapi import FastAPI, Request
from starlette.responses import StreamingResponse

app = FastAPI()

@app.get("/stream")
async def stream(request: Request):
    async def event_generator(state: dict[str, int | str]):
        try:
            for i in range(20):  # Simulating a long output from a large model
                state["step"] = i
                await asyncio.sleep(0.5)
                yield f"data: Line {i}\n\n"
        except Exception as e:
            print(f"Error occurred: {e}")
        state["finished"] = True

    state = {"step": 0}  # Shared state object

    async def watch_disconnect(request: Request):
        while True:
            if await request.is_disconnected():
                await asyncio.sleep(0.1)
                is_finished = state.get("finished", False)
                if not is_finished:
                    print(f"Client disconnected at step #{state['step']}")
                break

    asyncio.create_task(watch_disconnect(request))

    return StreamingResponse(event_generator(state), media_type="text/event-stream")

Logs

2025-06-12 11:37:16,677 - asyncio - WARNING - socket.send() raised exception.
2025-06-12 11:37:16,677 - test_app - ERROR - 139709795384192 - Unhandled Exception: No response returned.
  + Exception Group Traceback (most recent call last):
  |   File "/usr/local/lib/python3.10/site-packages/starlette/_utils.py", line 76, in collapse_excgroups
  |     yield
  |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 177, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 772, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
    |     await self.app(scope, receive, _send)
    |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 176, in __call__
    |     with recv_stream, send_stream, collapse_excgroups():
    |   File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    |     self.gen.throw(typ, value, traceback)
    |   File "/usr/local/lib/python3.10/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    |     raise exc
    |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 178, in __call__
    |     response = await self.dispatch_func(request, call_next)
    |   File "/app/test_app.py", line 1356, in verify_jwt
    |     return await call_next(request)
    |   File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 157, in call_next
    |     raise RuntimeError("No response returned.")
    | RuntimeError: No response returned.
    +------------------------------------

After restarting the service, the interface at @app.get("/stream") works fine again.

I'm unable to identify what specific circumstances are causing these exceptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tracking interruption point during large language model output streaming using FastAPI StreamingResponse #13707

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment · 8 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

Tracking interruption point during large language model output streaming using FastAPI StreamingResponse #13707

Uh oh!

kingmming May 12, 2025

First Check

Commit to Help

Example Code

Description

Description

Scenario

Operating System

Operating System Details

FastAPI Version

Pydantic Version

Python Version

Additional Context

Replies: 1 comment · 8 replies

Uh oh!

Uh oh!

YuriiMotov May 12, 2025 Collaborator

Uh oh!

kingmming May 15, 2025 Author

Uh oh!

YuriiMotov May 15, 2025 Collaborator

Uh oh!

kingmming May 16, 2025 Author

Uh oh!

YuriiMotov Jun 12, 2025 Collaborator

Uh oh!

Uh oh!

kingmming Jun 12, 2025 Author

Description

Code Snippet

Logs

kingmming
May 12, 2025

YuriiMotov
May 12, 2025
Collaborator

kingmming May 15, 2025
Author

YuriiMotov May 15, 2025
Collaborator

kingmming May 16, 2025
Author

YuriiMotov Jun 12, 2025
Collaborator

kingmming Jun 12, 2025
Author