Incorrect optimization in itertools.tee()

Bug description:

To save a memory allocation, the code path for a tee-in-a-tee incorrectly reuses the outer tee object as the first tee object in the result tuple. This is incorrect. All tee objects in the result tuple should have the same behavior. They are supposed to be "n independent iterators". However, the first one is not independent and it has different behaviors from the others. This is an unfortunate side-effect of an early incorrect optimization. I've now seen this affect real code. It surprising, unhelpful, undocumented, and hard to debug.

Demonstration:

from itertools import tee

def demo(i):
    it = iter('abcdefghi')
    [outer_tee] = tee(it, 1)
    inner_tee = tee(outer_tee, 10)[i]
    return next(inner_tee), next(outer_tee)

print('These should all give the same result:')
for i in range(10):
    print(i, demo(i))

This outputs:

These should all give the same result:
0 ('a', 'b')
1 ('a', 'a')
2 ('a', 'a')
3 ('a', 'a')
4 ('a', 'a')
5 ('a', 'a')
6 ('a', 'a')
7 ('a', 'a')
8 ('a', 'a')
9 ('a', 'a')

There is a test for the optimization -- it wasn't an accident. However, the optimization itself is a bug against the published specification in the docs and against general expectations.

        a, b = tee('abc')
        c, d = tee(a)
        self.assertTrue(a is c)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Incorrect optimization in itertools.tee() #123884

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Incorrect optimization in itertools.tee() #123884

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions