Description
Bug report
Bug description:
Several methods from the C implementation of the itertools module are not yet safe to use under the free-threading build. In this issue we list several issues to be addressed. The issues below are discussed for itertools.product
, but the issues are similar for the other classes.
-
When iterating over
product
the result tuple is re-used when the reference count is 1. We can use the new_PyObject_IsUniquelyReferenced
method to perform the check whether we can re-use the tuple. (this issue was also reported in enum_next and pairwise_next can result in tuple elements with zero reference count in free-threading build #121464) -
On the first invocation of
product
a new result is constructed.
cpython/Modules/itertoolsmodule.c
Lines 2038 to 2044 in 58ce131
This is not thread-safe, as multiple threads could have result == NULL
evaluate to true. We could move the construction of the productobject.result
to the constructor of product
. This does mean that product
will use more memory before the first invocation of next
. This seems to be acceptable, as constructing a product
without iterating over it seems rare in practice.
The tuple also needs to be filled with data. For product
it seems safe to do this in the constructor, as the data is coming
from productobject->pools
which is a tuple of tuples. But for pairwise
the data is coming from an iterable
cpython/Modules/itertoolsmodule.c
Lines 337 to 343 in 58ce131
which could be a generator. Reading data from the iterator before the first invocation of pairwise_next
seems like a behavior change we do not want to make.
An alternative is to use some kind of locking inside product_next
, but the locking should not add any overhead in the common path otherwise the single-thread performance will suffer.
- In case iterables are exhausted some cleaning up is done. For example in
pairwise_next
at
cpython/Modules/itertoolsmodule.c
Lines 352 to 356 in 58ce131
This cleaning up is not safe in concurrent iteration. Instead we can defer the cleaning up untill the object itself is decallocated (this approach was used for reversed
, see https://github.com/python/cpython/pull/120971/files#r1653313765)
- Actually constructing the new result requires some care as well. Even if we are fine with having funny results under concurrent iteration (see the discussion Sequence iterator thread-safety #120496), the concurrent iteration should not corrupt the interpreter. For example this code is not safe:
cpython/Modules/itertoolsmodule.c
Lines 2077 to 2088 in 58ce131
If two threads both increment indices[i]
the check on line 2078 is never true end we end up indexing pool
with PyTuple_GET_ITEM
outside the bounds on line 2088. Here we could change the check into indices[i] >= PyTuple_GET_SIZE(pool)
. That is equivalent for the single-threaded case, but does not lead to out-of-bounds indexing in the multi-threaded case (although it does lead to funny results!)
@rhettinger @colesbury Any input on the points above would be welcome.
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs
- gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848
- Draft: gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #125417
- gh-123471: make itertools.batched thread-safe #129416
- gh-123471: Make concurrent iteration over itertools.cycle safe under free-threading #131212
- gh-123471: Make concurrent iteration over itertools.repeat safe under free-threading #131247
- gh-123471: Make itertools.product and itertools.combinations thread-safe #132814
Metadata
Metadata
Assignees
Labels
Projects
Status