Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Copy link
Copy link
Open
@eendebakpt

Description

@eendebakpt
Issue body actions

Bug report

Bug description:

Several methods from the C implementation of the itertools module are not yet safe to use under the free-threading build. In this issue we list several issues to be addressed. The issues below are discussed for itertools.product, but the issues are similar for the other classes.

if (result == NULL) {
/* On the first pass, return an initial tuple filled with the
first element from each pool. */
result = PyTuple_New(npools);
if (result == NULL)
goto empty;
lz->result = result;

This is not thread-safe, as multiple threads could have result == NULL evaluate to true. We could move the construction of the productobject.result to the constructor of product. This does mean that product will use more memory before the first invocation of next. This seems to be acceptable, as constructing a product without iterating over it seems rare in practice.
The tuple also needs to be filled with data. For product it seems safe to do this in the constructor, as the data is coming
from productobject->pools which is a tuple of tuples. But for pairwise the data is coming from an iterable

if (old == NULL) {
old = (*Py_TYPE(it)->tp_iternext)(it);
Py_XSETREF(po->old, old);
if (old == NULL) {
Py_CLEAR(po->it);
return NULL;
}

which could be a generator. Reading data from the iterator before the first invocation of pairwise_next seems like a behavior change we do not want to make.

An alternative is to use some kind of locking inside product_next, but the locking should not add any overhead in the common path otherwise the single-thread performance will suffer.

  • In case iterables are exhausted some cleaning up is done. For example in pairwise_next at

if (new == NULL) {
Py_CLEAR(po->it);
Py_CLEAR(po->old);
Py_DECREF(old);
return NULL;

This cleaning up is not safe in concurrent iteration. Instead we can defer the cleaning up untill the object itself is decallocated (this approach was used for reversed, see https://github.com/python/cpython/pull/120971/files#r1653313765)

  • Actually constructing the new result requires some care as well. Even if we are fine with having funny results under concurrent iteration (see the discussion Sequence iterator thread-safety #120496), the concurrent iteration should not corrupt the interpreter. For example this code is not safe:

indices[i]++;
if (indices[i] == PyTuple_GET_SIZE(pool)) {
/* Roll-over and advance to next pool */
indices[i] = 0;
elem = PyTuple_GET_ITEM(pool, 0);
Py_INCREF(elem);
oldelem = PyTuple_GET_ITEM(result, i);
PyTuple_SET_ITEM(result, i, elem);
Py_DECREF(oldelem);
} else {
/* No rollover. Just increment and stop here. */
elem = PyTuple_GET_ITEM(pool, indices[i]);

If two threads both increment indices[i] the check on line 2078 is never true end we end up indexing pool with PyTuple_GET_ITEM outside the bounds on line 2088. Here we could change the check into indices[i] >= PyTuple_GET_SIZE(pool). That is equivalent for the single-threaded case, but does not lead to out-of-bounds indexing in the multi-threaded case (although it does lead to funny results!)

@rhettinger @colesbury Any input on the points above would be welcome.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Assignees

Projects

Status

Todo
Show more project fields

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.