Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Improve performance for np.result_type #28710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 23, 2025
Merged

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 15, 2025

Improve performance of np.result_type

  • Use NPY_ALLOC_WORKSPACE in array_result_type
  • Avoid increfs/decrefs in PyArray_ResultType
  • Fast path for array method lookup on dtypes
  • Fast path for identical arguments to np.result_type

Benchmark:

result_type(x): Mean +- std dev: [main] 156 ns +- 8 ns -> [pr] 135 ns +- 2 ns: 1.16x faster
result_type(x, x): Mean +- std dev: [main] 195 ns +- 5 ns -> [pr] 150 ns +- 4 ns: 1.30x faster
result_type(x, x32): Mean +- std dev: [main] 207 ns +- 5 ns -> [pr] 174 ns +- 5 ns: 1.18x faster
result_type(x, x32, x): Mean +- std dev: [main] 224 ns +- 11 ns -> [pr] 191 ns +- 5 ns: 1.17x faster
result_type(d, d): Mean +- std dev: [main] 269 ns +- 12 ns -> [pr] 177 ns +- 7 ns: 1.52x faster
result_type(d, d32): Mean +- std dev: [main] 276 ns +- 10 ns -> [pr] 214 ns +- 6 ns: 1.29x faster
result_type(d, d32, x): Mean +- std dev: [main] 303 ns +- 11 ns -> [pr] 245 ns +- 9 ns: 1.23x faster

Geometric mean: 1.26x faster

@eendebakpt
Copy link
Contributor Author

There are 2 failing tests due to a warning: ‘arr’ may be used uninitialized [-Wmaybe-uninitialized]. I think this is a false positive: the buffer buf is not initialized, but we are not using the buffer contents in the comparison (we are comparing the buffer address, which is initialized).

@eendebakpt eendebakpt changed the title Draft: ENH: Improve performance for np.result_type ENH: Improve performance for np.result_type Apr 15, 2025
@mattip
Copy link
Member

mattip commented Apr 15, 2025

It seems the NPY_ALLOC_WORKSPACE macro could use a tweak to avoid that warning, could you try that?

@eendebakpt
Copy link
Contributor Author

It was not a false positive after all, the arr could be uninitialized for the zero argument case.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the change to LookupSpecial, this looks good. If that one was important for the speed-up, I would suggest to put the test where the call is made rather than change it for everything.

@@ -59,6 +60,11 @@ PyArray_LookupSpecial(
return 0;
}

/* We do not need to check for special attributes on array descriptor types */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea - PyArray_LookupSpecial is used also to look for __array_ufunc__ and __array_function__ and I don't think those paths should be made slower for result_type.

p.s. Separately, is this actually used with result_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this one, the speed gain is not major. This path is being used if result_type as called with dtype arguments (not with exact ndarray arguments), I think because the np._result_type is wrapped by array_function_from_c_func_and_dispatcher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be OK to exclude dtypes in the dispatcher, but probably better in a separate PR.

PyArray_Descr **all_descriptors = (PyArray_Descr **)(&all_DTypes[narrs+ndtypes]);

/* Copy all dtypes into a single array defining non-value-based behaviour */
for (npy_intp i=0; i < ndtypes; i++) {
all_DTypes[i] = NPY_DTYPE(descrs[i]);
Py_INCREF(all_DTypes[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems safe, since we are holding on to descrs, but best to check with @seberg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borrowed references can be unsafe on the free-threaded build. That said, these are references to immortal (?) DType objects. The question mark there is that in principle someone could delete a DType in one thread and then another thread could hold the last reference to it here.

I have no idea if, in practice, we actually allow deleting DTypes or if that's a true goal for us to support.

Another option is to only drop the reference counting on the GIL-enabled build, since borrowed references should always be safe on the GIL-enabled build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning: here descrs is passed as an argument to PyArray_ResultType which means the elements of descrs cannot (or should not!?) be deleted by another thread. Since descrs objects are not deleted, the type of the descr should be safe to use.

numpy/_core/src/multiarray/multiarraymodule.c Outdated Show resolved Hide resolved
numpy/_core/src/multiarray/multiarraymodule.c Show resolved Hide resolved
PyArray_Descr **all_descriptors = (PyArray_Descr **)(&all_DTypes[narrs+ndtypes]);

/* Copy all dtypes into a single array defining non-value-based behaviour */
for (npy_intp i=0; i < ndtypes; i++) {
all_DTypes[i] = NPY_DTYPE(descrs[i]);
Py_INCREF(all_DTypes[i]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning: here descrs is passed as an argument to PyArray_ResultType which means the elements of descrs cannot (or should not!?) be deleted by another thread. Since descrs objects are not deleted, the type of the descr should be safe to use.

@@ -1628,14 +1627,10 @@ PyArray_ResultType(
all_descriptors[i_all] = PyArray_DTYPE(arrs[i]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not touched by the PR, but is this safe in the free-threading build? The arrs is passed so is safe, but PyArray_DTYPE gets a borrowed reference and another thread can replace the array dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a reproducer:

import unittest
from threading import Thread, Barrier
from test.support import threading_helper
import numpy as np

threading_helper.requires_working_threading(module=True)

class ItertoolsThreading(unittest.TestCase):

    
    @threading_helper.reap_threads
    def test_result_type(self):
        number_of_threads = 6
        number_of_iterations = 25
        number_of_cycles = 800

        barrier = Barrier(number_of_threads)
        
        x = np.array([1., 2., 3., 4.], np.dtype([('f1', [('f1', '<i2')])]))
        y = np.array([1, 2,], np.dtype([('f1', [('f1', '<i2')])]))
        
        def work(it):
            barrier.wait()
            for _ in range(number_of_cycles):
                #print(f'{it=} {_}')
                x.dtype = np.dtype([('f1', [('f1', '<i2')])]) # single reference
                y.dtype = np.dtype([('f1', [('f1', '<i2')])]) # single reference
                np.result_type(x, y)
               
               

        for it in range(number_of_iterations):
            print(f'iteration {it}')
            worker_threads = []
            for ii in range(number_of_threads):
                worker_threads.append(
                    Thread(target=work, args=[ii]))

            with threading_helper.start_threads(worker_threads):
                pass

            barrier.reset()

        print('done')

if __name__ == "__main__":
    unittest.main()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file an issue for this? This is kind of a weird thing to do so I’m not very worried but maybe there’s a simple workaround.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and previous reviewers seemed happy, so merging, thanks.

(IIRC one of the worst performance parts result_type was that it is often called on objects where we have to do a __array_function__ lookup that fails. Hopefully, future Python versions will allow us to speed this up.)

@seberg seberg merged commit 9ead596 into numpy:main Apr 23, 2025
71 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.