Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

gh-85287: Change codecs to raise precise UnicodeEncodeError and UnicodeDecodeError #113674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Mar 17, 2024
Merged
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7339989
fix issue gh-85287
jjsloboda Jan 3, 2024
e92d414
add news blurb
jjsloboda Jan 3, 2024
10e7cd0
add more lenient unicode error handling within the except blocks
jjsloboda Jan 3, 2024
0122f90
fix IDNA-specific length issue
jjsloboda Jan 3, 2024
a310dd2
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Jan 3, 2024
63948d2
fix two issues
jjsloboda Jan 3, 2024
4479ab2
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Jan 3, 2024
81310e3
use plain UnicodeError for problems outside the en/decoded string
jjsloboda Jan 6, 2024
367de4e
split label empty vs too long
jjsloboda Jan 6, 2024
9f57515
use labels input for finding error offset, not output result
jjsloboda Jan 6, 2024
389122d
update test for undefined encoding
jjsloboda Jan 6, 2024
fe47caa
fixed linebreaks on some of the longer exceptions
jjsloboda Jan 6, 2024
a4098fa
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Jan 6, 2024
95cb5bb
add tests for unicode error offsets, and tighten up the logic for cal…
jjsloboda Jan 7, 2024
10d092f
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Jan 7, 2024
9ac979f
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Feb 16, 2024
aefd7c2
reduce scope of exception object, and fail gracefully if it cannot be…
jjsloboda Feb 16, 2024
f73ccfe
use object formatting on inbuf directly in exc
jjsloboda Feb 16, 2024
e0747b4
reduce scope of exception object, and fail gracefully if it cannot be…
jjsloboda Feb 16, 2024
93e99ae
update MultibyteIncrementalEncoder.getstate()
methane Feb 21, 2024
87e1f99
fixup
methane Feb 21, 2024
0728a43
change buffer size issue error back to UnicodeError
jjsloboda Feb 22, 2024
0f80786
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Feb 22, 2024
5c8c59e
Merge branch 'main' into unicode-errors-fix-85287
jjsloboda Feb 22, 2024
1cc911d
update test to match changed exception
jjsloboda Feb 22, 2024
9594bae
Update Modules/cjkcodecs/multibytecodec.c
methane Feb 23, 2024
ea3ff8a
improve idna codec errors
methane Feb 23, 2024
8a2bc50
improve punycode.decode()
methane Feb 23, 2024
a63e17a
improve punycode_decode again
methane Feb 23, 2024
4c329e4
Merge branch 'main' into unicode-errors-fix-85287
methane Mar 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
use labels input for finding error offset, not output result
  • Loading branch information
jjsloboda committed Jan 6, 2024
commit 9f575153c328bb7cf3fd664d095b5b9f5533d5ec
33 changes: 14 additions & 19 deletions 33 Lib/encodings/idna.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,16 +191,15 @@ def encode(self, input, errors='strict'):
else:
# ASCII name: fast path
labels = result.split(b'.')
index = 0
for label in labels[:-1]:
if not (0 < len(label) < 64):
for i, label in enumerate(labels[:-1]):
if len(label) == 0:
raise UnicodeEncodeError("idna", input, index, index+1, "label empty")
elif len(label >= 64:
raise UnicodeEncodeError("idna", input, index, index+len(label), "label too long")
index += len(label) + 1
if len(labels[-1]) >= 64:
raise UnicodeEncodeError("idna", input, index, len(input), "label too long")
offset = sum(len(l) for l in labels[:i]) + i
raise UnicodeEncodeError("idna", input, offset, offset+1, "label empty")
for i, label in enumerate(labels):
if len(label) >= 64:
offset = sum(len(l) for l in labels[:i]) + i
raise UnicodeEncodeError(
"idna", input, offset, offset+len(label), "label too long")
return result, len(input)

result = bytearray()
Expand All @@ -210,18 +209,19 @@ def encode(self, input, errors='strict'):
del labels[-1]
else:
trailing_dot = b''
for label in labels:
for i, label in enumerate(labels):
if result:
# Join with U+002E
result.extend(b'.')
try:
result.extend(ToASCII(label))
except UnicodeEncodeError as exc:
offset = sum(len(l) for l in labels[:i]) + i
raise UnicodeEncodeError(
"idna",
input,
len(result) + exc.start,
len(result) + exc.end,
offset + exc.start,
offset + exc.end,
exc.reason,
)
methane marked this conversation as resolved.
Show resolved Hide resolved
return bytes(result+trailing_dot), len(input)
Expand Down Expand Up @@ -259,14 +259,9 @@ def decode(self, input, errors='strict'):
try:
u_label = ToUnicode(label)
except UnicodeEncodeError as exc:
len_result = sum(len(x) for x in result) + len(result)
offset = sum(len(x) for x in result) + len(result)
raise UnicodeDecodeError(
"idna",
input,
len_result + exc.start,
len_result + exc.end,
exc.reason,
)
"idna", input, offset+exc.start, offset+exc.end, exc.reason)
else:
result.append(u_label)

Expand Down
Morty Proxy This is a proxified and sanitized view of the page, visit original site.