Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters #22219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 14, 2020
3 changes: 2 additions & 1 deletion 3 Lib/encodings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def normalize_encoding(encoding):
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
if c.isascii():
chars.append(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you to add a ".. versionchanged:: 3.10" entry in the documentation, but then I noticed that the encodings module was never documented! Oh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end user will use this function or module, I can try to create the doc, but I need some time to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and must be addressed in a separated PR anymore. The lack of documentation should not hold this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, copy that.

punct = False
else:
punct = True
Expand Down
17 changes: 17 additions & 0 deletions 17 Lib/test/test_codecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3440,5 +3440,22 @@ def search_function(encoding):
self.assertEqual(NOT_FOUND, codecs.lookup('a\xe9\u20ac-8'))


class EncodingNormalizationTest(unittest.TestCase):
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved

def test_bpo39337(self):
"""
bpo-39337: similar to _Py_normalize_encoding(),
encodings.normalize_encoding() should ignore non-ASCII letters.
"""
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved
import encodings
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved

out = encodings.normalize_encoding('utf\xE9\u20AC\U0010ffff-8')
self.assertEqual(out, 'utf_8')
out = encodings.normalize_encoding('utf_8')
self.assertEqual(out, 'utf_8')
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved
out = encodings.normalize_encoding('utf 8')
self.assertEqual(out, 'utf_8')
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:func:`encodings.normalize_encoding` now ignores non-ASCII letters.
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved
Morty Proxy This is a proxified and sanitized view of the page, visit original site.