ENH: allow start-stop array for indices in reduceat #25476
+294
−107
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
EDIT (2024-11-23): added tests and documentation, and move out of draft. I stuck to having start and stop be separate rows, so that one can easily pass
(start, stop)
(as in the examples). I feel that is most logical given the present structure, where one can see a single array as a start array that implies a defaultstop
array. This can of course still be changed.Rationale
ufuncs have a
.reduceat
method that allows having piecewise reductions, but using an array ofindices
that is rather convoluted. Fromhttps://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html, the
indices
are interpreted as follows:The exceptions are the main issue I have with the current definition: really, the current setup is only natural for contiguous pieces; for anything else, it requires contortion. For instance, the documentation describes how to get a
running sum as follows:
Note the slice at the end to remove the unwanted elements! And note that this omits the last set of 4 elements -- to get this, one has to add a solitary index 4 at the end - one cannot get slices that include the last element except as the last one.
The PR arose from this unnatural way to describe slices: Why can one not just pass in the start and stop values directly? With no exceptions, but just interpreted as slices should be. I.e., get a running sum as,
Currently, the updated docstring explains the new mode as follows:
The PR also adds a new
initial
keyword argument. The reason for this is that with the new layout I did not want to have the exception currently present, where ifstop < start
, one gets the value at start. I felt it was more logical to treat this case as an empty reduction, but then it becomes necessary to able to pass in an initial value for reductions that do not have an identity, likenp.minimum
(which of course just helps makereduceat
more similar toreduce
).Note that I considered requiring
slice(start, stop)
, which might be clearer. I only did not do that since implementation-wise just having a tuple or an array with 2 columns was super easy. I also liked that with this implementation the old way could at least in principle be described in terms of the new one, as having a defaultstop
that just takes the next element ofstart
(with the same exceptions as above). I ended not describing it as such in the docstring, though.Anyway, if in principle it is thought a good idea to make
reduceat
more flexible, the API is up for discussion. It could requireindices=slice(start, stop)
(possibly step too), or one could allow not passing inindices
ifstart
andstop
are present, or just add astop
keyword argument, whose defaults are interpreted as before.Links
(where I suggested adding a
slice
argument to reduce instead; also an option...)Old text
Triggered by #834 seeing some comments again, a draft just to see how it would look to allow
reduceat
to take a set of start, stop indices (treated as slices), to make the interface a bit more easily comprehensible without making a truly new method. It also allows passing in aninitial
to deal with empty slices.Fixes #834
Mostly to discuss whether we want this at all, and, if so, what the API should be. So probably best not to worry too much about implementation (the duplication of code, both in
reduceat
itself and withreduce
is large).Sample use:
Writing it out like this, I think a different order may be useful, i.e.,
np.add(a, [(1, 2), (3, -1), (5, 0)])
. The reason I picked the other one was that I liked the idea of triggering it by usingslice(start, stop)
, with bothstart
andstop
possibly arrays and a tuple of two lists was closer to that (although internally it just turns it into an array). The list of tuples suggests more a structured array with start and stop (and step?) entries.p.s. Fairly trivially extensible to
start, stop, step
.