Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

quote_from_bytes uses a lot of memory for larger bytestrings #95865

Copy link
Copy link
Closed
@iforapsy

Description

@iforapsy
Issue body actions

Bug report

When passed a bytestring that is over a hundred mebibytes (MiB), the urllib.parse.quote_from_bytes function uses much more memory and CPU than one would expect.

repro.py:

#!/usr/bin/env python3

import base64
from time import perf_counter
from urllib.parse import quote_from_bytes

MIB = 1024 ** 2


def main():
    bytes_ = base64.b64encode(100 * MIB * b'\x00')  # note 1
    start = perf_counter()
    quoted = quote_from_bytes(bytes_)
    stop = perf_counter()

    print(f"Quoting {len(bytes_)/1024**2:.3f} MiB took {stop-start} seconds")


if __name__ == '__main__':
    main()

I use /usr/bin/time to track how much CPU and memory is used.

$ /usr/bin/time -v ./repro.py
Quoting 133.333 MiB took 7.290915511985077 seconds
        Command being timed: "./repro.py"
        User time (seconds): 7.12
        System time (seconds): 0.68
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.82
        ...
        Maximum resident set size (kbytes): 1374872
        ...

The function ends up at one point needing ten times the size of the bytestring to quote it (i.e. 1.31 GiB). It also takes several seconds to return. I expect it to return in under a second. Fortunately, there's no memory leak as the interpreter does return the memory after the function returns.

Interestingly, if I reduce 100 to 90 in the line marked "note 1", the function returns in half a second and uses only 250 MiB, which is much more in line with my pre-bug expectations.

This function consuming so much memory affects the AWSSDK for Python, boto3, as a lot of AWS APIs are called with URL-encoded parameters. boto3/botocore calls urllib.parse.urlencode to do that encoding. That ends up calling the problematic quote_from_bytes. Sample stack trace:

  File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 898, in _make_api_call
    http, parsed_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 921, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 198, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 139, in create_request
    prepared_request = self.prepare_request(request)
  File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 150, in prepare_request
    return request.prepare()
  File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 473, in prepare
    return self._request_preparer.prepare(self)
  File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 360, in prepare
    body = self._prepare_body(original)
  File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 416, in _prepare_body
    body = urlencode(params, doseq=True)
  File "/usr/lib/python3.8/urllib/parse.py", line 962, in urlencode
    v = quote_via(v, safe)
  File "/usr/lib/python3.8/urllib/parse.py", line 870, in quote_plus
    return quote(string, safe, encoding, errors)
  File "/usr/lib/python3.8/urllib/parse.py", line 859, in quote
    return quote_from_bytes(string, safe)
  File "/usr/lib/python3.8/urllib/parse.py", line 898, in quote_from_bytes
    return ''.join([quoter(char) for char in bs])

Your environment

Python 3.8.10 on Ubuntu 20.04 running on a t3.large EC2 instance. I have also been able to reproduce it with Python 3.10.6 and 3.11.0rc1+. I also reproduced it on Windows 10 running Python 3.9.13.

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixesonly security fixesperformancePerformance or resource usagePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.