Closed as not planned
Description
Bug report
Bug description:
The tokenize module creates TokenInfo objects with a .line
attribute. In Python 3.11, each token on a line used the same string object for .line
. In 3.12, each token has a new copy of the same string.
This is part of a memory issue reported against coverage.py: nedbat/coveragepy#1791
# tok.py
import io
import sys
import tokenize
print(f"{sys.version = }")
text = "lorem ipsum quia dolor sit amet consectetur adipisci velit"
readline = io.StringIO(text).readline
toks = list(tokenize.generate_tokens(readline))
print(f"{toks[0].line = }")
print(f"{(toks[0].line == toks[1].line) = }")
print(f"{(toks[0].line is toks[1].line) = }")
3.11 re-uses string objects:
% python3.11 /tmp/tok.py
sys.version = '3.11.9 (main, Apr 8 2024, 14:01:56) [Clang 15.0.0 (clang-1500.3.9.4)]'
toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit'
(toks[0].line == toks[1].line) = True
(toks[0].line is toks[1].line) = True
3.12 (and above) makes new string objects:
% python3.12 /tmp/tok.py
sys.version = '3.12.3 (main, Apr 9 2024, 15:45:14) [Clang 15.0.0 (clang-1500.3.9.4)]'
toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit'
(toks[0].line == toks[1].line) = True
(toks[0].line is toks[1].line) = False
CPython versions tested on:
3.11, 3.12, 3.13, CPython main branch
Operating systems tested on:
macOS
Metadata
Metadata
Assignees
Labels
Python modules in the Lib dirPython modules in the Lib dirAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error