Add 'with_python' (i.e. with_gil) to prange and parallel #6562

da-woods · Dec 13, 2024

This is mostly targetted at freethreading builds, but should run on normal builds (with a lot of deadlock avoidance).

It's designed to be a more efficient alternative to

for i in prange(...):
  with gil:
    ...

This is mostly targetted at freethreading builds, but should run on normal builds (with a lot of deadlock avoidance). It's designed to be a more efficient alternative to ``` for i in prange(...): with gil: ... ```

da-woods · Dec 13, 2024

Cython/Compiler/Nodes.py

+            warning(
+                self.pos,
+                "'with_gil' is experimental. Cython does almost nothing to ensure "
+                "thread safe use of Python variables (including reference counting) "
+                "and so it is completely your responsibility. Use with caution!", 999)


I don't intend to keep this warning forever. But I think we need more control of PyObject variables in parallel blocks before we can really start recommending it.

lysnikolaou

Left a couple of questions. Maybe also the name of the paremeter should be changed from with_gil to something more hinting at the free-threaded build?

lysnikolaou · Jan 7, 2025

Cython/Compiler/Nodes.py

+            # Any firstprivate creates a barrier at least on GCC (and thus
+            # a deadlock if we're using the GIL). And there's also an implicit


Have you examined whether this would also be relevant if someone's using critical sections inside an openmp loop? If so, should we document that?

I'm assuming you mean with cython.critical_section? (I ask because OpenMP has a "critical_section" feature that we use internally to guard certain small blocks... I think we use that safely though).

I don't think that's cython.critical_section should be an issue here. The barrier is just that it waits for all threads to be ready to go before the absolutely first iteration. So it's just trying not to hold the GIL when that happens. There shouldn't be an issue with anyone using critical sections or other kinds of locks within the loop itself.

lysnikolaou · Jan 7, 2025

docs/src/userguide/parallelism.rst

+However, there is now some experimental support for running with the GIL in
+freethreading builds. You must either use these functions in a no-gil block,


I guess that this wording will be confusing for most people that are not intimately familiar with how the free-threaded build works internally. Maybe we could reword it to something else like "with the thread state saved"? Although that seems suboptimal as well.

Ditto for other places.

Yes good point.

I think we're stuck with the naming for the existing with gil:/with nogil features. But I can add an explanation if the free-threading docs about what that means the the context of a freethreading build.

But since this is a new feature I can probably pick a different name for the parameter. And I'll try to rephrase the docs to be a bit clearer.

da-woods · Jan 8, 2025

I've renamed with_gil in this context to with_python and tried to clarify the docs here.

lysnikolaou · Jan 8, 2025

I've renamed with_gil in this context to with_python and tried to clarify the docs here.

That sounds better. What do you think about something like with_thread_state_saved?

da-woods · Jan 8, 2025

I've renamed with_gil in this context to with_python and tried to clarify the docs here.

That sounds better. What do you think about something like with_thread_state_saved?

Probably too technical. I assume most users would rather not know about thread state at all. (Although I know the existing with gil/with nogil are similarly technical).

I prefer with_python because it really just describes what you can do.

It opens the posibility of a deadock, seems like the wrong thing to do, and I don't think releasing the GIL is too expensive.

Cython/Compiler/Nodes.py

It seems more reliable than a complicated jugling of the GIL with a barrier

scoder · May 31, 2025

So, I looked over this and read through the documentation snippets, but I still feel unsure about the goal and effects of this change. I gathered that the new with_python=True option assures a valid thread state for the OpenMP threads but does not itself control the GIL state, is that correct? Why can't all OpenMP threads have Python thread states right away, without a special option?

Could you maybe extend the PR description with some kind of problem description and motivation?

da-woods · May 31, 2025

It's pretty close to writing

for i in prange(N, nogil=True):
    with gil:
        ... # code

It's slightly more efficient (just by not using the PyGILState API as much) but not as much as I hoped when I first started looking at it - the big limitation is that Cython does have to release and re-acquire the GIL each time you go round the loop to avoid deadlock (or somehow eliminate last_private from the loop, which is is what requires the barrier).

Why can't all OpenMP threads have Python thread states right away, without a special option?

Mainly because it's a bit of a pitfall with the non-freethreading interpreter. It means that people that accidentally omit nogil=True will not get any parallelism even if they aren't using the GIL. So at least for me it seems better to make it explicit

scoder · May 31, 2025

It's pretty close to writing
for i in prange(N, nogil=True):
    with gil:
        ... # code
It's slightly more efficient

So … can't we make the idiom above more efficient then? Is it only for in prange() directly followed by with gil or also a with gil appearing somewhere later in the (otherwise nogil) loop? The first should be easy to detect.

people that accidentally omit nogil=True will not get any parallelism even if they aren't using the GIL

But isn't that expected? I think the documentation states pretty clearly that parallelism requires releasing the GIL. Wasn't it always the case that you get sequential "threads" if you run a prange loop without releasing the GIL?

da-woods · May 31, 2025

I'm possibly not explaining very well

for i in prange(N, with_python=True):
    ... # code

is transformed to something pretty close to

for i in prange(N, nogil=True):
    with gil:
        ... # code

(but a little faster)

people that accidentally omit nogil=True will not get any parallelism even if they aren't using the GIL

But isn't that expected? I think the documentation states pretty clearly that parallelism requires releasing the GIL. Wasn't it always the case that you get sequential "threads" if you run a prange loop without releasing the GIL?

Right now you get a compile-time error if you try to write a prange loop without releasing the GIL.

But yes - in principle we could just try to speed up prange(N, nogil=True): with gil: instead and just document that as the idiomatic approach. Given that it doesn't seem possible to safely go round the loop without releasing the GIL the benefit of this PR is fairly minor anyway

scoder · May 31, 2025

Right now you get a compile-time error if you try to write a prange loop without releasing the GIL.

Ah, so, we basically push people into releasing the GIL even if they directly need it afterwards, is that it? Ok, that use case doesn't seem hugely important to me – if you want parallelism, you probably want it for parallel code, not sequential code. But I can see that it can be useful to do some kind of Python level initialisation inside of the thread first. We also have parallel sections for stuff like that, but well, it's still a use case. And I think the obvious idiom for it is for in prange: with gil:. If there's a way to optimise such code, why not. I don't see why we should bother users with yet another option.

Do you have an example of real world code where this is used?

scoder · May 31, 2025

BTW, we don't have any benchmarks for our parallel code, neither micro nor macro. Since you're working on the freethreading and parallel code, maybe you could come up with at least some microbenchmarks for this kind of mixture between Python and nogil code? I'm not sure if they'll run well in CI (could be that we have a single CPU core there), but we'll see. At least sequential locking and critical sections might still be candidates for measuring their overhead.

da-woods · May 31, 2025

Ah, so, we basically push people into releasing the GIL even if they directly need it afterwards, is that it? Ok, that use case doesn't seem hugely important to me – if you want parallelism, you probably want it for parallel code, not sequential code

The use case is that in freethreaded Python this code is parallel.

This is mainly targeted at people using freethreaded Python who want to use prange loops without manually messing around with the GIL. (On non-freethreaded Python the intention is that it works, but clearly the GIL means it won't work well).

The main limitation is that it isn't currently much better than writing for i in prange(N, nogil=True): with gil: ..., so it's mainly a better expression of intent than better C code.

I subsequently had a thought about how to avoid needing to release the GIL on each loop (at least in some cases) so maybe I should look at that first - it's easier to justify something new if it genuinely does lead to better performance

Make the loop "nowait" and force synchronization after it

da-woods · May 31, 2025

So I think I've been able to drop the "release the GIL on every go round", which should be an improvement.

A quick timing test:

# cython: freethreading_compatible=True

# distutils: extra_compile_args=-fopenmp
# distutils: extra_link_args=-fopenmp

from cython.parallel import prange

def f(i):
    return i

def test_prange_with_python(int N):
    cdef int i = 0

    out = [None]*N

    for i in prange(N, with_python=True):
        out[i] = f(i)
    
def test_old_way(int N):
    cdef int i = 0

    out = [None]*N

    for i in prange(N, nogil=True):
        with gil:
            out[i] = f(i)

def do_timing(N):
    from datetime import datetime
    t0 = datetime.now()
    test_old_way(N)
    t1 = datetime.now()
    test_prange_with_python(N)
    t2 = datetime.now()

    print(f"old way:", t1-t0)
    print(f"new way:", t2-t1)

On freethreading Python 3.13 N=1000000 the with_python version is consistently about 10% faster (i.e. noticeable but not huge):

old way: 0:00:00.249138
new way: 0:00:00.221372

On GIL Python 3.13 N=1000000 the with_python version is dramatically faster

old way: 0:00:28.616594
new way: 0:00:00.073102

(i.e. when the GIL is contended then constantly swapping it is absolutely awful).

So the upshot is, it's a mild improvement for the freethreading builds (for which it's intended) and a dramatic improvement for the GIL build (where it isn't intended)

scoder · Jun 1, 2025

freethreading_compatible=True

Isn't this a good indicator for whether we should warn about the GIL being released or not when we enter a prange loop? Even with your speed improvement, it still matters in non-FT builds, right?

da-woods · Jun 1, 2025

Isn't this a good indicator for whether we should warn about the GIL being released or not when we enter a prange loop?

I think what you're suggesting is:

Remove the explicit with_gil directive,
If the user omits nogil=True then don't error - let them use the GIL.
Warn if they don't release the GIL and don't mark it as freethreading_compatible

?

If so, that seems reasonable to me.

Even with your speed improvement, it still matters in non-FT builds, right?

Yes - on non-FT builds it's running sequentially, just not fighting over the GIL quite so often.

scoder · Jun 2, 2025

Remove the explicit with_gil directive,

If the user omits nogil=True then don't error - let them use the GIL.

Warn if they don't release the GIL and don't mark it as freethreading_compatible

Yes, that's what I was thinking.

Add 'with_gil' to prange and parallel

5cf29ce

This is mostly targetted at freethreading builds, but should run on normal builds (with a lot of deadlock avoidance). It's designed to be a more efficient alternative to ``` for i in prange(...): with gil: ... ```

da-woods added the freethreading CPython label Dec 13, 2024

Merge branch 'master' into withgil_prange

eb05605

da-woods commented Dec 13, 2024

View reviewed changes

da-woods added 2 commits December 13, 2024 21:51

Spelling

cb6cd44

Code style

769fd7a

lysnikolaou reviewed Jan 7, 2025

View reviewed changes

da-woods added 2 commits January 8, 2025 20:39

'with_gil' -> 'with_python'

47b70fb

Improve documentation

3814607

da-woods changed the title ~~Add 'with_gil' to prange and parallel~~ Add 'with_python' (i.e. with_gil) to prange and parallel Jan 8, 2025

da-woods added this to the 3.2 milestone May 3, 2025

da-woods added 2 commits May 3, 2025 11:57

Merge remote-tracking branch 'real_origin/master' into withgil_prange

d572d85

Strip out "IsTrueFreethreading"

30309bc

It opens the posibility of a deadock, seems like the wrong thing to do, and I don't think releasing the GIL is too expensive.

da-woods commented May 18, 2025

View reviewed changes

Cython/Compiler/Nodes.py Outdated Show resolved Hide resolved

da-woods added 2 commits May 29, 2025 19:11

Merge remote-tracking branch 'real_origin/master' into withgil_prange

09374d5

Wrap loops in no-gil; more tests

dd1bd6e

It seems more reliable than a complicated jugling of the GIL with a barrier

da-woods force-pushed the withgil_prange branch from a3584ba to dd1bd6e Compare May 29, 2025 20:39

Avoid releasing the loop on every go round

457d517

Make the loop "nowait" and force synchronization after it

Trailing whitespace

8c473aa

		# Any firstprivate creates a barrier at least on GCC (and thus
		# a deadlock if we're using the GIL). And there's also an implicit

		However, there is now some experimental support for running with the GIL in
		freethreading builds. You must either use these functions in a no-gil block,

Search code, repositories, users, issues, pull requests...

Uh oh!

Add 'with_python' (i.e. with_gil) to prange and parallel #6562

Are you sure you want to change the base?

Add 'with_python' (i.e. with_gil) to prange and parallel #6562

Uh oh!

Conversation

da-woods commented Dec 13, 2024

Uh oh!

da-woods Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

lysnikolaou left a comment

Choose a reason for hiding this comment

Uh oh!

lysnikolaou Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

da-woods Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

lysnikolaou Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

da-woods Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

da-woods commented Jan 8, 2025

Uh oh!

lysnikolaou commented Jan 8, 2025

Uh oh!

da-woods commented Jan 8, 2025

Uh oh!

Uh oh!

scoder commented May 31, 2025

Uh oh!

da-woods commented May 31, 2025

Uh oh!

scoder commented May 31, 2025

Uh oh!

da-woods commented May 31, 2025

Uh oh!

scoder commented May 31, 2025

Uh oh!

scoder commented May 31, 2025

Uh oh!

da-woods commented May 31, 2025

Uh oh!

da-woods commented May 31, 2025

Uh oh!

scoder commented Jun 1, 2025 via email

Uh oh!

da-woods commented Jun 1, 2025

Uh oh!

scoder commented Jun 2, 2025

Uh oh!

Uh oh!

da-woods Jan 7, 2025 •

edited

Loading