ENH: allow start-stop array for indices in reduceat #25476

mhvk · Dec 22, 2023

EDIT (2024-11-23): added tests and documentation, and move out of draft. I stuck to having start and stop be separate rows, so that one can easily pass (start, stop) (as in the examples). I feel that is most logical given the present structure, where one can see a single array as a start array that implies a default stop array. This can of course still be changed.

Rationale

ufuncs have a .reduceat method that allows having piecewise reductions, but using an array of indices that is rather convoluted. From
https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html, the indices are interpreted as follows:

For i in range(len(indices)), reduceat computes
ufunc.reduce(array[indices[i]:indices[i+1]]), which becomes the i-th
generalized "row" parallel to `axis` in the final result (i.e., <snip>).
There are three exceptions to this:

* when i = len(indices) - 1 (so for the last index), indices[i+1] = array.shape[axis].
* if indices[i] >= indices[i + 1], the i-th generalized “row” is simply array[indices[i]].
* if indices[i] >= len(array) or indices[i] < 0, an error is raised.

The exceptions are the main issue I have with the current definition: really, the current setup is only natural for contiguous pieces; for anything else, it requires contortion. For instance, the documentation describes how to get a
running sum as follows:

np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]

Note the slice at the end to remove the unwanted elements! And note that this omits the last set of 4 elements -- to get this, one has to add a solitary index 4 at the end - one cannot get slices that include the last element except as the last one.

The PR arose from this unnatural way to describe slices: Why can one not just pass in the start and stop values directly? With no exceptions, but just interpreted as slices should be. I.e., get a running sum as,

np.add.reduceat(np.arange(8), ((start := np.arange(0, 8//2+1)), start+8//2))

Currently, the updated docstring explains the new mode as follows:

    There are two modes for how `indices` is interpreted. If it is a tuple of
    2 arrays (or an array with two rows), then these are interpreted as start
    and stop values of slices over which to compute reductions, i.e., for each
    row i, ``ufunc.reduce(array[indices[0, i]:indices[1, i]])`` is computed,
    which becomes the i-th element along `axis` in the final result (e.g., in
    a 2-D array, if ``axis=0``, it becomes the i-th row, but if ``axis=1``,
    it becomes the i-th column). Like for slices, negative indices are allowed
    for both start and stop, and the values are clipped to be between 0 and
    the shape of the array along `axis`.

The PR also adds a new initial keyword argument. The reason for this is that with the new layout I did not want to have the exception currently present, where if stop < start, one gets the value at start. I felt it was more logical to treat this case as an empty reduction, but then it becomes necessary to able to pass in an initial value for reductions that do not have an identity, like np.minimum (which of course just helps make reduceat more similar to reduce).

Note that I considered requiring slice(start, stop), which might be clearer. I only did not do that since implementation-wise just having a tuple or an array with 2 columns was super easy. I also liked that with this implementation the old way could at least in principle be described in terms of the new one, as having a default stop that just takes the next element of start (with the same exceptions as above). I ended not describing it as such in the docstring, though.

Anyway, if in principle it is thought a good idea to make reduceat more flexible, the API is up for discussion. It could require indices=slice(start, stop) (possibly step too), or one could allow not passing in indices if start and stop are present, or just add a stop keyword argument, whose defaults are interpreted as before.

Links

Github issue: reduceat cornercase (Trac #236) #834
2011 thread: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/DX5KVE5O36MQHIEBOFK6YRH2JPRMFPVB/#I5MKK4ZPX3FA6K6H5457F4WOHYSO67NN
2016 thread 1: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RI7MYBUB6S7PGXQ27ZCNLOWEPNSFMDHI/#PBIB7BEA35NE2WMI25SJJEWN7YW6W72V
2016 thread 2: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RZZ3TJVB5M4UFTZKD4XDDLPT2AW3ANR6/#RZZ3TJVB5M4UFTZKD4XDDLPT2AW3ANR6
2017 thread: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YKLT53KE54MKKO2FDWOYQQZGSN4EGSGU/#YKLT53KE54MKKO2FDWOYQQZGSN4EGSGU
(where I suggested adding a slice argument to reduce instead; also an option...)
2023 thread (this one): https://mail.python.org/archives/list/numpy-discussion@python.org/thread/VWDXYJW362WPNE5JLZCQNPEAQD6EIKSI/#MB4LXPBEJ5OFMPXG24PCAFTXYLQIXZSG

Old text

Triggered by #834 seeing some comments again, a draft just to see how it would look to allow reduceat to take a set of start, stop indices (treated as slices), to make the interface a bit more easily comprehensible without making a truly new method. It also allows passing in an initial to deal with empty slices.

Fixes #834

Mostly to discuss whether we want this at all, and, if so, what the API should be. So probably best not to worry too much about implementation (the duplication of code, both in reduceat itself and with reduce is large).

Sample use:

a = np.arange(12)
np.add.reduceat(a, ([1, 3, 5], [2, -1, 0]))
# array([ 1, 52,  0])
np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0]), initial=10)
# array([ 1,  3, 10])
np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0]))
# ValueError: empty slice encountered with reduceat operation for 'minimum', which does not have an identity. Specify 'initial'.

Writing it out like this, I think a different order may be useful, i.e., np.add(a, [(1, 2), (3, -1), (5, 0)]). The reason I picked the other one was that I liked the idea of triggering it by using slice(start, stop), with both start and stop possibly arrays and a tuple of two lists was closer to that (although internally it just turns it into an array). The list of tuples suggests more a structured array with start and stop (and step?) entries.

p.s. Fairly trivially extensible to start, stop, step.

Also introduces an initial argument, though it can only be used with the start-stop format.

mhvk added 01 - Enhancement component: numpy.ufunc labels Dec 22, 2023

mhvk added this to the 2.0.0 release milestone Dec 22, 2023

mhvk mentioned this pull request Dec 22, 2023

reduceat cornercase (Trac #236) #834

Open

seberg modified the milestones: 2.0.0 release, 2.1.0 release Jan 31, 2024

mhvk removed this from the 2.1.0 release milestone Jan 31, 2024

mhvk force-pushed the reduceat-start-stop branch from 191d140 to 2a6548b Compare November 22, 2024 23:04

ENH: allow start-stop array for indices in reduceat

393ac30

Also introduces an initial argument, though it can only be used with the start-stop format.

mhvk force-pushed the reduceat-start-stop branch from 2a6548b to 393ac30 Compare November 23, 2024 14:11

mhvk marked this pull request as ready for review November 24, 2024 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: allow start-stop array for indices in reduceat #25476

ENH: allow start-stop array for indices in reduceat #25476

mhvk commented Dec 22, 2023 •

edited

Loading

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

ENH: allow start-stop array for indices in reduceat #25476

Are you sure you want to change the base?

ENH: allow start-stop array for indices in reduceat #25476

Conversation

mhvk commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Links

Old text

Uh oh!

Uh oh!

mhvk commented Dec 22, 2023 •

edited

Loading