Description
Bug report
When passed a bytestring that is over a hundred mebibytes (MiB), the urllib.parse.quote_from_bytes
function uses much more memory and CPU than one would expect.
repro.py:
#!/usr/bin/env python3
import base64
from time import perf_counter
from urllib.parse import quote_from_bytes
MIB = 1024 ** 2
def main():
bytes_ = base64.b64encode(100 * MIB * b'\x00') # note 1
start = perf_counter()
quoted = quote_from_bytes(bytes_)
stop = perf_counter()
print(f"Quoting {len(bytes_)/1024**2:.3f} MiB took {stop-start} seconds")
if __name__ == '__main__':
main()
I use /usr/bin/time
to track how much CPU and memory is used.
$ /usr/bin/time -v ./repro.py
Quoting 133.333 MiB took 7.290915511985077 seconds
Command being timed: "./repro.py"
User time (seconds): 7.12
System time (seconds): 0.68
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.82
...
Maximum resident set size (kbytes): 1374872
...
The function ends up at one point needing ten times the size of the bytestring to quote it (i.e. 1.31 GiB). It also takes several seconds to return. I expect it to return in under a second. Fortunately, there's no memory leak as the interpreter does return the memory after the function returns.
Interestingly, if I reduce 100 to 90 in the line marked "note 1", the function returns in half a second and uses only 250 MiB, which is much more in line with my pre-bug expectations.
This function consuming so much memory affects the AWSSDK for Python, boto3, as a lot of AWS APIs are called with URL-encoded parameters. boto3/botocore calls urllib.parse.urlencode
to do that encoding. That ends up calling the problematic quote_from_bytes
. Sample stack trace:
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 508, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 898, in _make_api_call
http, parsed_response = self._make_request(
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 921, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 139, in create_request
prepared_request = self.prepare_request(request)
File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 150, in prepare_request
return request.prepare()
File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 473, in prepare
return self._request_preparer.prepare(self)
File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 360, in prepare
body = self._prepare_body(original)
File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 416, in _prepare_body
body = urlencode(params, doseq=True)
File "/usr/lib/python3.8/urllib/parse.py", line 962, in urlencode
v = quote_via(v, safe)
File "/usr/lib/python3.8/urllib/parse.py", line 870, in quote_plus
return quote(string, safe, encoding, errors)
File "/usr/lib/python3.8/urllib/parse.py", line 859, in quote
return quote_from_bytes(string, safe)
File "/usr/lib/python3.8/urllib/parse.py", line 898, in quote_from_bytes
return ''.join([quoter(char) for char in bs])
Your environment
Python 3.8.10 on Ubuntu 20.04 running on a t3.large EC2 instance. I have also been able to reproduce it with Python 3.10.6 and 3.11.0rc1+. I also reproduced it on Windows 10 running Python 3.9.13.