Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

GH-133711: Enable UTF-8 mode by default (PEP 686) #133712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
Loading
from

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented May 8, 2025

@StanFromIreland
Copy link
Contributor

Can someone please (I don't have permissions to order it) run this buildbot to verify it clears up #133677 for 3.15.

@AA-Turner

This comment was marked as resolved.

@methane
Copy link
Member

methane commented May 9, 2025

test_python_legacy_windows_stdio tests pipe encoding, but it should test console I/O encoding.
cc: @zooba

@methane
Copy link
Member

methane commented May 9, 2025

assert_python_ok uses PIPE for stdin/stdout/stderr.
spawn_python doesn't use PIPE for stderr. So this test can be rewritten like this.
But this test still requires the test is running in Console.

diff --git a/Lib/test/test_cmd_line.py b/Lib/test/test_cmd_line.py
index 1b40e0d05fe..243069aeb18 100644
--- a/Lib/test/test_cmd_line.py
+++ b/Lib/test/test_cmd_line.py
@@ -972,10 +972,12 @@ def test_python_legacy_windows_fs_encoding(self):

     @unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows')
     def test_python_legacy_windows_stdio(self):
-        code = "import sys; print(sys.stdin.encoding, sys.stdout.encoding)"
+        # stdin and stdout are PIPE. So we check stderr encoding for Console I/O.
+        code = "import sys; print(sys.stderr.encoding)"
         expected = 'cp'
-        rc, out, err = assert_python_ok('-c', code, PYTHONLEGACYWINDOWSSTDIO='1')
-        self.assertIn(expected.encode(), out)
+        p = spawn_python('-c', code, env={"PYTHONLEGACYWINDOWSSTDIO": "1"})
+        out, rc = _kill_python_and_exit_code(p)
+        self.assertRegex(rb'\Acp\d+\Z', out.strip())

(I don't test it yet because I don't use Windows daily.)

Doc/library/os.rst Outdated Show resolved Hide resolved
If the UTF-8 mode is disabled, the interpreter defaults to using
the current locale settings, *unless* the current locale is identified
as a legacy ASCII-based locale (as described for :envvar:`PYTHONCOERCECLOCALE`),
and locale coercion is either disabled or fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is PEP 538: Coercing the legacy C locale to a UTF-8 based locale still relevant if UTF-8 mode is enabled by default? It may make disabling the UTF-8 mode more complicated. It's just an open question, I don't have the answer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When UTF-8 mode is disabled:

  • If locale is not C or POSIX: locale encoding is used.
  • If locale is C or POSIX and PYTHONCOERCELOCALE is not set, locale is changed to C.UTF-8.
    • Although UTF-8 mode is disabled, locale encoding is UTF-8.
  • If locale is C or POSIX and PYTHONCOERCELOCALE is set, locale encoding will be ASCII.

Doc/library/sys.rst Outdated Show resolved Hide resolved
* Python UTF-8 mode is now enabled by default.
It may be disabled with by setting :envvar:`PYTHONUTF8=0 <PYTHONUTF8>` as
an environment variable or by using the :option:`-X utf8=0 <-X>` flag.
See :pep:`686` for further details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can probably put some more explanation in here, such as that it affects TextIOWrapper and hence open(). The current description doesn't sound as scary as it needs to, in my opinion.

Along the lines of: "Python UTF-8 mode is now enabled by default. This means that (files/console/etc.) will now use UTF-8 regardless of system settings, unless specifically overridden in code (typically with an encoding= argument)."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, it's nothing new. But we shouldn't assume that everyone already knows what UTF-8 mode implies. There are many more people out there who haven't ever thought about it than those who are waiting for it to be the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another effect of the UTF-8 Mode is that Python ignores the locale encoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FFY00 FFY00 removed their request for review May 10, 2025 02:28
Doc/using/windows.rst Show resolved Hide resolved
@@ -75,7 +75,30 @@ New features
Other language changes
======================


* Python now uses UTF-8_ as the default encoding, independent of the system's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might mention the UTF-8 Mode earlier since it has other side effects documented in the UTF-8 Mode section, such as changing sys.stdout error handler and ignoring the locale encoding.

Lib/test/test_cmd_line.py Outdated Show resolved Hide resolved
@AA-Turner

This comment was marked as resolved.

@bedevere-bot

This comment was marked as resolved.

code = 'import sys; print(type(sys.stderr.buffer.raw))'
env = {'PYTHONLEGACYWINDOWSSTDIO': str(int(legacy_windows_stdio))}
# use stderr=None as legacy_windows_stdio doesn't affect pipes
p = spawn_python('-c', code, env=env, stderr=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub Action would run test with pipe, not with console.
In such case, stderr=None is still pipe.

Adding creationflags=CREATE_NEW_CONSOLE would allocate new console for subprocess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creationflags=CREATE_NEW_CONSOLE didn't fix test on GitHub Action...

@@ -972,10 +976,19 @@ def test_python_legacy_windows_fs_encoding(self):

@unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@unittest.skipUnless(support.MS_WINDOWS, 'Test only applicable on Windows')
@unittest.skipUnless(type(sys.stderr.buffer.raw).__name__ == "_WindowsConsoleIO",
"Test only applicable on Windows with console IO")

@StanFromIreland
Copy link
Contributor

Fails on buildbot:

======================================================================
FAIL: test_nonascii (test.test_readline.TestReadline.test_nonascii)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stan/buildarea/pull_request.stan-raspbian/build/Lib/test/test_readline.py", line 315, in test_nonascii
    self.assertIn(b"result " + expected + b"\r\n", output)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: b"result '[\\xefnserted]|t\\xebxt[after]'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\xefnserted]|t\xeb[after]\x08\x08\x08\x08\x08\x08\x08text \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 11 13\r\n\x07text \'t\\xeb\'\r\nline \'[\\xefnserted]|t\\xeb[after]\'\r\nindexes 11 13\r\nsubstitution \'t\\xeb\'\r\nmatches [\'t\\xebnt\', \'t\\xebxt\']\r\n\x1b[1@x\x1b[1@t\r\nresult \'[\\udcefnserted]|t\\udcebxt[after]\'\r\nhistory \'[\\xefnserted]|t\\xebxt[after]\'\r\n")
----------------------------------------------------------------------
Ran 14 tests in 2.151s

Presumably a result of this pr since I have never seen this fail before. At least all the tests in #133677 are no longer problematic.

@vstinner
Copy link
Member

FAIL: test_nonascii (test.test_readline.TestReadline.test_nonascii)

I suppose that you looked at ARM64 Raspbian PR. This buildbot has a special locale encoding:

== encodings: locale=ISO-8859-1 FS=utf-8

test.pythoninfo:

locale.getencoding: ISO-8859-1
os.environ[LANG]: en_IE
os.environ[LC_ALL]: en_IE

The locale en_IE doesn't use UTF-8 but ISO-8859-1. I can reproduce the issue with fr_FR locale which also uses ISO-8859-1:

$ LANG=fr_FR ./python -m test -v test_readline  -u all
...
FAIL: test_nonascii (test.test_readline.TestReadline.test_nonascii)
...

I can also reproduce the issue in the main branch using the fr_FR locale and the command:

PYTHONUTF8=1 LANG=fr_FR ./python -m test -v test_readline  -u all

@vstinner
Copy link
Member

test_cmd_line fail on Windows. You can try @methane's suggestion.

FAIL: test_python_legacy_windows_stdio (test.test_cmd_line.CmdLineTest.test_python_legacy_windows_stdio)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cpython\cpython\Lib\test\test_cmd_line.py", line 991, in test_python_legacy_windows_stdio
    self.assertEqual('_io._WindowsConsoleIO', out)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: '_io._WindowsConsoleIO' != '_io.FileIO'
- _io._WindowsConsoleIO
+ _io.FileIO

@AA-Turner
Copy link
Member Author

I've paused work on this PR as Serhiy asked to wait until all issues with running tests in a non ASCII/UTF8 locale have been fixed.

I'd like to try and find a solution to properly testing legacy windows stdio whilst on CI, but I agree that Inada-san's suggestion will work otherwise.

@methane
Copy link
Member

methane commented May 16, 2025

I have created PR to fix test_python_legacy_windows_stdio.
#134080

@methane
Copy link
Member

methane commented May 16, 2025

I can reproduce the test_readline fail without UTF-8 mode on macOS & main branch.

$ ./python.exe -c 'import sys; print(sys.flags.utf8_mode)'
0

$ export LANG=en_US.ISO8859-1
$ locale
LANG="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_CTYPE="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_ALL=

$ ./python.exe Lib/test/test_readline.py
readline version: 0x402
readline runtime version: 0x402
readline library version: 'EditLine wrapper'
use libedit emulation? True
s..s.....s.F..
======================================================================
FAIL: test_nonascii (__main__.TestReadline.test_nonascii)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/inada-n/work/python/cpython/Lib/test/test_readline.py", line 298, in test_nonascii
    self.assertIn(b"text 't\\xeb'\r\n", output)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\xefnserted]|t\xeb[after]\x08\x08\x08\x08\x08\x08\x08\x1b[1@x[\x08\r\nresult \'[\\xefnserted]|t\\xebx[after]\'\r\nhistory \'[\\xefnserted]|t\\xebx[after]\'\r\n")

----------------------------------------------------------------------
Ran 14 tests in 0.235s

FAILED (failures=1, skipped=3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.