Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Unicode characters ≥ 0x10000 cannot be inputted/behaves unusually at the REPL terminal. #136595

Copy link
Copy link
@haydenwong7bm

Description

@haydenwong7bm
Issue body actions

Bug report

Bug description:

When the machine locale is set to UTF-8, when inputting a Unicode character ≥ 0x10000:
In CPython 3.13.5:
https://github.com/user-attachments/assets/7777b063-76fe-4929-b854-cae7d61807d2
In Cpython 3.14.0b4:

>>> Traceback (most recent call last):
  File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
>>> Traceback (most recent call last):
  File "*\Python\Python314\Lib\_pyrepl\readline.py", line 394, in multiline_input
    return reader.readline()
           ~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 748, in readline
    self.handle1()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 731, in handle1
    self.do_cmd(cmd)
    ~~~~~~~~~~~^^^^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 661, in do_cmd
    self.refresh()
    ~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 638, in refresh
    self.screen = self.calc_screen()
                  ~~~~~~~~~~~~~~~~^^
  File "*\Python\Python314\Lib\_pyrepl\completing_reader.py", line 261, in calc_screen
    screen = super().calc_screen()
  File "*\Python\Python314\Lib\_pyrepl\reader.py", line 315, in calc_screen
    colors = list(gen_colors(self.get_unicode()))
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 108, in gen_colors
    for color in gen_colors_from_token_stream(gen, line_lengths):
                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 168, in gen_colors_from_token_stream
    for prev_token, token, next_token in token_window:
                                         ^^^^^^^^^^^^
  File "*\Python\Python314\Lib\_pyrepl\utils.py", line 363, in prev_next_window
    window = deque((None, next(iterator)), maxlen=3)
                          ~~~~^^^^^^^^^^
  File "*\Python\Python314\Lib\tokenize.py", line 582, in _generate_tokens_from_c_tokenizer
    for info in it:
                ^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed

Two surrogates were "inputted" and so two UnicodeEncodeErrors.

CPython versions tested on:

3.13, 3.14

Operating systems tested on:

Windows

Linked PRs

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS-windowsstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytopic-replRelated to the interactive shellRelated to the interactive shelltopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.