Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: allow start-stop array for indices in reduceat #25476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Dec 22, 2023

EDIT (2024-11-23): added tests and documentation, and move out of draft. I stuck to having start and stop be separate rows, so that one can easily pass (start, stop) (as in the examples). I feel that is most logical given the present structure, where one can see a single array as a start array that implies a default stop array. This can of course still be changed.

Rationale

ufuncs have a .reduceat method that allows having piecewise reductions, but using an array of indices that is rather convoluted. From
https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html, the indices are interpreted as follows:

For i in range(len(indices)), reduceat computes
ufunc.reduce(array[indices[i]:indices[i+1]]), which becomes the i-th
generalized "row" parallel to `axis` in the final result (i.e., <snip>).
There are three exceptions to this:

* when i = len(indices) - 1 (so for the last index), indices[i+1] = array.shape[axis].
* if indices[i] >= indices[i + 1], the i-th generalized “row” is simply array[indices[i]].
* if indices[i] >= len(array) or indices[i] < 0, an error is raised.

The exceptions are the main issue I have with the current definition: really, the current setup is only natural for contiguous pieces; for anything else, it requires contortion. For instance, the documentation describes how to get a
running sum as follows:

np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]

Note the slice at the end to remove the unwanted elements! And note that this omits the last set of 4 elements -- to get this, one has to add a solitary index 4 at the end - one cannot get slices that include the last element except as the last one.

The PR arose from this unnatural way to describe slices: Why can one not just pass in the start and stop values directly? With no exceptions, but just interpreted as slices should be. I.e., get a running sum as,

np.add.reduceat(np.arange(8), ((start := np.arange(0, 8//2+1)), start+8//2))

Currently, the updated docstring explains the new mode as follows:

    There are two modes for how `indices` is interpreted. If it is a tuple of
    2 arrays (or an array with two rows), then these are interpreted as start
    and stop values of slices over which to compute reductions, i.e., for each
    row i, ``ufunc.reduce(array[indices[0, i]:indices[1, i]])`` is computed,
    which becomes the i-th element along `axis` in the final result (e.g., in
    a 2-D array, if ``axis=0``, it becomes the i-th row, but if ``axis=1``,
    it becomes the i-th column). Like for slices, negative indices are allowed
    for both start and stop, and the values are clipped to be between 0 and
    the shape of the array along `axis`.

The PR also adds a new initial keyword argument. The reason for this is that with the new layout I did not want to have the exception currently present, where if stop < start, one gets the value at start. I felt it was more logical to treat this case as an empty reduction, but then it becomes necessary to able to pass in an initial value for reductions that do not have an identity, like np.minimum (which of course just helps make reduceat more similar to reduce).

Note that I considered requiring slice(start, stop), which might be clearer. I only did not do that since implementation-wise just having a tuple or an array with 2 columns was super easy. I also liked that with this implementation the old way could at least in principle be described in terms of the new one, as having a default stop that just takes the next element of start (with the same exceptions as above). I ended not describing it as such in the docstring, though.

Anyway, if in principle it is thought a good idea to make reduceat more flexible, the API is up for discussion. It could require indices=slice(start, stop) (possibly step too), or one could allow not passing in indices if start and stop are present, or just add a stop keyword argument, whose defaults are interpreted as before.

Links

Old text

Triggered by #834 seeing some comments again, a draft just to see how it would look to allow reduceat to take a set of start, stop indices (treated as slices), to make the interface a bit more easily comprehensible without making a truly new method. It also allows passing in an initial to deal with empty slices.

Fixes #834

Mostly to discuss whether we want this at all, and, if so, what the API should be. So probably best not to worry too much about implementation (the duplication of code, both in reduceat itself and with reduce is large).

Sample use:

a = np.arange(12)
np.add.reduceat(a, ([1, 3, 5], [2, -1, 0]))
# array([ 1, 52,  0])
np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0]), initial=10)
# array([ 1,  3, 10])
np.minimum.reduceat(a, ([1, 3, 5], [2, -1, 0]))
# ValueError: empty slice encountered with reduceat operation for 'minimum', which does not have an identity. Specify 'initial'.

Writing it out like this, I think a different order may be useful, i.e., np.add(a, [(1, 2), (3, -1), (5, 0)]). The reason I picked the other one was that I liked the idea of triggering it by using slice(start, stop), with both start and stop possibly arrays and a tuple of two lists was closer to that (although internally it just turns it into an array). The list of tuples suggests more a structured array with start and stop (and step?) entries.

p.s. Fairly trivially extensible to start, stop, step.

Also introduces an initial argument, though it can only be used
with the start-stop format.
@mhvk mhvk force-pushed the reduceat-start-stop branch from 2a6548b to 393ac30 Compare November 23, 2024 14:11
@mhvk mhvk marked this pull request as ready for review November 24, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reduceat cornercase (Trac #236)
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.