Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
This repository was archived by the owner on Mar 6, 2026. It is now read-only.
This repository was archived by the owner on Mar 6, 2026. It is now read-only.

Avoid running out of memory during slow iteration: set worker_queue maxsize to be 1, and/or make it configurable #561

Copy link
Copy link

Description

@michalc
Issue body actions

Is your feature request related to a problem? Please describe.

When using to_dataframe_iterable for a large result set (with nested/repeated records) on a system with a fast upstream connection to BigQuery, but slow downstream processing, Python can use all the memory on the system, and get killed.

I think I have tracked it down to the worker_queue that is used to pass data from the worker threads back to the main thread: it does not have a maxsize.

worker_queue = queue.Queue()

This means that if items are not pulled from the queue fast enough, then all the memory on the system can be used

Describe the solution you'd like

I think a reasonable solution would be to set the maxsize to be 1:

worker_queue = queue.Queue(maxsize=1)

This would still effectively be a buffer size of "number of threads + 1" pages since each thread would fetch into a variable, and then wait if the queue is full.

However, it being configurable would also be reasonable I think.

Describe alternatives you've considered

I've monkey-patched for now, and it seems to work. But ideally, it wouldn't be necessary

# ...

def _monkey_patch_queue_maxsize_1():
    OriginalQueue = queue.Queue

    class QueueWithMaxsize1(OriginalQueue):
        def __init__(self):
            super().__init__(maxsize=1)

    def _restore():
        queue.Queue = OriginalQueue

    queue.Queue = QueueWithMaxsize1

    return _restore

# ....

query = bqClient.query(sql)
result_rows = query.result()

ensure_original_queue = _monkey_patch_queue_maxsize_1()

pages = result_rows.to_dataframe_iterable(bqStorageClient)

for page in pages:
   ensure_original_queue()
   # ... something slow, even time.sleep can do it
Reactions are currently unavailable

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.