email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly.

Bug report

Bug description:

I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.

from email.parser import BytesParser
from email.policy import default
from io import BytesIO

mime_file_byte_array = b'MIME-Version: 1.0\r\nContent-Type: multipart/mixed; boundary="MIME\
_boundary-1";\r\n\r\n--MIME_boundary-1\r\nContent-Type: application/octet-stream\r\nContent\
-Location: test.bin\r\n\r\na\r\nb\r\n--MIME_boundary-1--\r\n\r\n'
fp = BytesIO(mime_file_byte_array)
parser = BytesParser(policy=default)
msg = parser.parse(fp)

parts = [part for part in msg.walk()]
binary_data = parts[1].get_payload(decode=True)

print('===== Beginning of Original MIME File =====')
print(mime_file_byte_array.decode())
print('===== End of Original MIME File =====')
print('')
print('===== test.bin after parse =====')
print(binary_data)
print('===== test.bin after parse =====')

As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable binary_data is supposed to contain b"a\r\nb", but it was actually b"a\nb".

It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.

cpython/Lib/email/parser.py

Line 103 in 767c89b

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape')

When I replaced the above line with the line below, this problem was fixed. However, this fix may have a side effect which I cannot foresee.

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')

CPython versions tested on:

3.10

Operating systems tested on:

Linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions