Description
Bug report
Bug description:
I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.
from email.parser import BytesParser
from email.policy import default
from io import BytesIO
mime_file_byte_array = b'MIME-Version: 1.0\r\nContent-Type: multipart/mixed; boundary="MIME\
_boundary-1";\r\n\r\n--MIME_boundary-1\r\nContent-Type: application/octet-stream\r\nContent\
-Location: test.bin\r\n\r\na\r\nb\r\n--MIME_boundary-1--\r\n\r\n'
fp = BytesIO(mime_file_byte_array)
parser = BytesParser(policy=default)
msg = parser.parse(fp)
parts = [part for part in msg.walk()]
binary_data = parts[1].get_payload(decode=True)
print('===== Beginning of Original MIME File =====')
print(mime_file_byte_array.decode())
print('===== End of Original MIME File =====')
print('')
print('===== test.bin after parse =====')
print(binary_data)
print('===== test.bin after parse =====')
As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable binary_data
is supposed to contain b"a\r\nb", but it was actually b"a\nb".
It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.
Line 103 in 767c89b
When I replaced the above line with the line below, this problem was fixed. However, this fix may have a side effect which I cannot foresee.
fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')
CPython versions tested on:
3.10
Operating systems tested on:
Linux