ENH/DEP: Use a ufunc under the hood for ndarray.clip #12519

eric-wieser · Dec 9, 2018

This includes:

The addition of 3-input PyObject inner loop
The removal of ->f->fastclip for builtin types, which now use ufuncs instead
A deprecation in PyArray_Clip for third-party types that still use ->f->fastclip
A deprecation of the unusual casting behavior of clip
A deprecation of the broken nan-behavior of clip, which was previously dependent on dimensionality and byte-order.
new: A behavior change for max < min which brings object arrays in line with the other types

Surprisingly, this actually seems to have better performance (N=10000):

In [1]: a = np.arange(10000, dtype=float)
In [2]: a2 = np.empty(a.shape, dtype=[('a', a.dtype), ('b', np.bool)])['a']
In [3]: a2[...] = a

In [4]: %timeit np.clip(a, 0, 1)
20.3 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit np.core.umath.clip(a, 0, 1)
16.9 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [6]: %timeit np.clip(a2, 0, 1)
65.7 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit np.core.umath.clip(a2, 0, 1)
62.6 µs ± 970 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Edit: those performance numbers are probably not representative of the wrapped call - there is a big cost to pay for the deprecations.

numpy/core/_methods.py

numpy/core/src/umath/loops.c.src

mattip · Dec 9, 2018

if you have a ready benchmark, perhaps you could add it?

numpy/core/src/umath/loops.c.src

eric-wieser · Dec 9, 2018

numpy/core/src/umath/loops.c.src

+        npy_intp n = dimensions[0];
+        npy_intp i;
+
+        /* contiguous, branch to let the compiler optimize */


Just enough optimization here to make um.clip faster than np.clip in the cases that are common.

numpy/core/src/umath/loops.c.src

eric-wieser · Dec 9, 2018

@mattip: Benchmark added. I don't know how much energy I have to put into this, but this looks like the lowest-hanging fruit of #12514. Feel free to take over this PR as your own, if you want to.

eric-wieser · Dec 9, 2018

Outdated but non-resolved comments above still apply

numpy/core/code_generators/generate_umath.py

mhvk · Dec 19, 2018

numpy/core/_methods.py

+    if min is None and max is None:
+        raise ValueError("One of max or min must be given")
+    elif min is None:
+        return um.clip_above(a, max, out=out, casting=casting)


Can we not use um.minimum here? (and um.maximum instead of clip_below)

Nope - clip needs to preserve NaNs in and only in the first argument. maximum always propagates, and fmax always ignores. So we need a third maximum-like function

mhvk · Dec 19, 2018

numpy/core/code_generators/ufunc_docstrings.py

+        'clip',
+    )
+
+    if name[0] != '_' and name not in skip:
        # matmul is special, it does not use the OUT_SCALAR replacement strings


Need to adjust the comment here on why clip is excluded; though perhaps better to put it behind each item in skip.

Can you still move this comment up to the definition of skip?

mhvk · Dec 19, 2018

Taking the conversation of using minimum and maximum for clip outside of the in-line comments (since I missed the first one...): is it obvious what the behaviour should be? I think that if min or max is NaN, it is not strange to just pass that on. Oddly, that is partially what the current clip does:

a = np.array([10., np.nan, np.nan, -10.])
a.clip(np.array([np.nan, np.nan, 1., 1.]), np.array([2., 2., np.nan, np.nan]))
# RuntimeWarning: invalid value encountered in minimum
# RuntimeWarning: invalid value encountered in maximum
# array([nan, nan, nan, nan])

Looking at the code, I see this is because for array-like min and max, the _slow_array_clip path is chosen, which in fact uses the minimum and maximum ufuncs!

My sense would be to keep this behaviour, following the maxim that in case of doubt one should refuse the temptation to guess: the user should be ±np.inf if a particular element should not have a minimum or maximum.

Note that if we felt it was needed, we could keep this completely backwards compatible if in the wrapping code we checked for cases of scalar min/max=np.nan and treated those as equivalent None and then also raised a DeprecationWarning.

eric-wieser · Dec 21, 2018

is it obvious what the behaviour should be

Not really - I just noticed that in the past someone complained when the nan was propagated (#7601)

the user should be ±np.inf if a particular element should not have a minimum or maximum.

That's a fairly strong argument I hadn't thought of as to why the nan behavior is useless. Unfortunately the replacement is quite a bit more verbose.

np.clip(arr, np.where(np.isnan(a_min), -np.inf, a_min), np.where(np.isnan(a_max), np.inf, a_max))

mhvk · Dec 21, 2018

@eric-wieser - #7601 complained that nan in the array was not propagated on some windows virtual machine! I think we agree that no matter what, nan in the input array should be propagated (i.e., we will not use fmin, fmax behaviour). The question is what to do with nan in min/max.

Here, remember that at present nan do get propagated if min or max are arrays (since at present minimum and maximum are used for that case). So, the question is whether we want to extend the scalar behaviour of ignoring min,max=nan to arrays or extend the array behaviour of propagating min,max=nan to scalars (possibly with deprecation).

My own sense is to always propagate nan for the reasons I gave earlier; it also may be the least likely to lead to regressions, since having a nan in an array of min or max seems much more likely than passing in a nan scalar for min or max. (But then, scalars are surely used much more often... Which perhaps argues for deprecation of the nan scalar case....)

eric-wieser · Feb 4, 2019

Given that the current nan behavior is inconsistent and depends on things like the byte order, I think we just drop this without a deprecation warning:

>>> np.array(1.0).clip(np.nan, np.nan)
1.0
>>> np.array(1.0).newbyteorder('S').clip(np.nan, np.nan)
RuntimeWarning: invalid value encountered in minimum
RuntimeWarning: invalid value encountered in maximum
nan

mhvk · Feb 4, 2019

OK, makes sense to just have it in the release notes. So, we change to always behave consistently with mininum(maximum(array, clip_low), clip_high)?

charris · May 12, 2019

Needs rebase. Test failure is fixed in master.

charris · May 12, 2019

Needs rebase. Test error is fixed in master.

This includes: * The addition of 3-input PyObject inner loop * The removal of `->f->fastclip` for builtin types, which now use ufuncs instead * A deprecation in `PyArray_Clip` for third-party types that still use `->f->fastclip` * A deprecation of the unusual casting behavior of `clip` * A deprecation of the broken `nan`-behavior of `clip`, which was previously dependent on dimensionality and byte-order.

* Unit test for object clip issue * parametrized test_simple_int32_inout() to additionally handle scenario where "unsafe" casting is explicitly passed through, as requested by reviewer * add a requested comment related to possibility of future nan check optimization for `@name@_clip` * add a unit test to cover the case where np.clip() is called with None for both max and min * add a unit test for the case where out is None and casting is specified as an invalid value, to flush through code path in TestClip.fastclip * add unit test + doc update for expected behavior when amin > amax * add unit test + patch for bug in npy_ObjectClip; its operations and operands were wrong; the unit test case is based on a scenario probed by the hypothesis library * add unit test for error in timedelta64 MAX function for clip handling * add unit test for a pathological case where np.ones(10) is clipped with amin=1, amax=0

This changes the object array behavior to match the other behavior

tylerjereddy · May 14, 2019

rebased / force pushed -- if CI is all green I plan to merge as noted above

tylerjereddy · May 14, 2019

The CI failures are from things like numpy/core/src/umath/simd.inc.src:1407:1: warning: ‘AVX512F_log_FLOAT’ defined but not used [-Wunused-function].

This is presumably coming in from work by @r-devulap and @juliantaylor?

Hopefully no merge conflict in release notes arises while waiting for this to get resolved / rebased.

r-devulap · May 15, 2019

I think the reason you are seeing that error is the file simd.inc is included in fast_loop_macros.h which is then included in clip.c.src. simd.inc contains the code for static functions AVX2_log_FLOAT etc which ends up not being used anywhere in clip.c.src and hence the error. inlining the ISA_exp/log_FLOAT functions with NPY_INLINE keyword solves the build issue you are seeing.

r-devulap · May 15, 2019

Is there a reason why simd.inc is included in fast_loop_macros.h?

tylerjereddy · May 15, 2019

Perhaps we can make that refinement if the inclusion is not needed @eric-wieser ?

r-devulap · May 15, 2019

I was able to build without #include<simd.inc> in fast_loop_macros.h, so I am not sure if that include is needed.

tylerjereddy · May 15, 2019

Thanks, that's helpful feedback! Eric or Marten can probably confirm the safety of removal, but sounds straightforward.

eric-wieser · May 15, 2019

I was under the impression that some of the macros in that file expand to things from the simd header.

I think adding inline is the right way to go here.

* ISA_exp/log have been inlined to prevent build errors when included in PR 12519

tylerjereddy · May 15, 2019

Ok, I inlined the low-level exp / log stuff & local test suite results are unchanged while the build warnings-turned-errors disappear. Hopefully CI agrees.

tylerjereddy · May 15, 2019

Thanks Eric and all reviewers. Merging with fingers crossed based on various discussions above.

eric-wieser added component: numpy._core 03 - Maintenance 25 - WIP labels Dec 9, 2018

eric-wieser commented Dec 9, 2018

View reviewed changes

numpy/core/_methods.py Outdated Show resolved Hide resolved

eric-wieser commented Dec 9, 2018

View reviewed changes

numpy/core/src/umath/loops.c.src Outdated Show resolved Hide resolved

eric-wieser commented Dec 9, 2018

View reviewed changes

numpy/core/src/umath/loops.c.src Outdated Show resolved Hide resolved

eric-wieser commented Dec 9, 2018

View reviewed changes

numpy/core/src/umath/loops.c.src Outdated Show resolved Hide resolved

teoliphant mentioned this pull request Dec 9, 2018

ENH: Convert PyArray_Functions to generalized ufuncs #12514

Open

eric-wieser force-pushed the clip-ufunc branch from a00d605 to c74d00e Compare December 9, 2018 18:32

eric-wieser force-pushed the clip-ufunc branch from c74d00e to 348e3f6 Compare December 9, 2018 22:58

eric-wieser commented Dec 19, 2018

View reviewed changes

numpy/core/code_generators/generate_umath.py Outdated Show resolved Hide resolved

eric-wieser force-pushed the clip-ufunc branch from 348e3f6 to 4989ee1 Compare December 19, 2018 09:42

mhvk reviewed Dec 19, 2018

View reviewed changes

eric-wieser mentioned this pull request Dec 19, 2018

WIP: MAINT: Made clip into an ufunc #7876

Closed

This was referenced Feb 25, 2019

MAINT: Replace if statement with a dictionary lookup for ease of extensibility in ufunc generator #13031

Merged

MAINT: Extract the loop macros into their own header #13032

Merged

eric-wieser force-pushed the clip-ufunc branch from 4989ee1 to 375f443 Compare February 26, 2019 07:18

eric-wieser added 01 - Enhancement 07 - Deprecation and removed 25 - WIP labels Feb 26, 2019

eric-wieser force-pushed the clip-ufunc branch 2 times, most recently from 1a30542 to add2e0c Compare May 11, 2019 22:25

eric-wieser and others added 4 commits May 14, 2019 13:52

BUG: Restore the old non-object behavior for min > max

ed825fb

This changes the object array behavior to match the other behavior

DOC: Add release note

f76fa21

tylerjereddy force-pushed the clip-ufunc branch from add2e0c to f76fa21 Compare May 14, 2019 21:04

MAINT: reviewer adjustments re: AVX

31f0bb1

* ISA_exp/log have been inlined to prevent build errors when included in PR 12519

tylerjereddy merged commit 75ea05f into numpy:master May 15, 2019

tylerjereddy added this to the 1.17.0 release milestone May 15, 2019

mattip mentioned this pull request Jun 1, 2019

BUG: Add ValueError for read-only array in clip() function #12242 #13258

Closed

toslunar mentioned this pull request Jul 10, 2019

Support NumPy 1.17 chainer/chainer#7741

Closed

12 tasks

eric-wieser mentioned this pull request Jul 28, 2019

Inconsistency of np.clip with MaskedArray when using output argument #14140

Closed

charris mentioned this pull request Aug 15, 2019

np.clip slower in numpy 1.17 #14281

Closed

tylerjereddy mentioned this pull request Sep 11, 2019

TST: Add the first test using hypothesis #14440

Closed

eric-wieser mentioned this pull request Dec 6, 2019

np.clip: change in precedence between minimum and maximum when maximum < minimum #15061

Closed

Zac-HD mentioned this pull request Dec 27, 2019

ENH: Add property-based tests using Hypothesis #15189

Merged

5 tasks

WarrenWeckesser mentioned this pull request Dec 17, 2021

BUG: clip() does not respect the writeable flag #12242

Closed

Search code, repositories, users, issues, pull requests...

Uh oh!

ENH/DEP: Use a ufunc under the hood for ndarray.clip #12519

ENH/DEP: Use a ufunc under the hood for ndarray.clip #12519

Uh oh!

Conversation

eric-wieser commented Dec 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattip commented Dec 9, 2018

Uh oh!

Uh oh!

eric-wieser Dec 9, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eric-wieser commented Dec 9, 2018

Uh oh!

eric-wieser commented Dec 9, 2018

Uh oh!

Uh oh!

mhvk Dec 19, 2018

Choose a reason for hiding this comment

Uh oh!

eric-wieser Dec 19, 2018

Choose a reason for hiding this comment

Uh oh!

mhvk Dec 19, 2018

Choose a reason for hiding this comment

Uh oh!

mhvk Feb 26, 2019

Choose a reason for hiding this comment

Uh oh!

mhvk commented Dec 19, 2018

Uh oh!

eric-wieser commented Dec 21, 2018

Uh oh!

mhvk commented Dec 21, 2018

Uh oh!

eric-wieser commented Feb 4, 2019

Uh oh!

mhvk commented Feb 4, 2019

Uh oh!

charris commented May 12, 2019

Uh oh!

charris commented May 12, 2019

Uh oh!

tylerjereddy commented May 14, 2019

Uh oh!

tylerjereddy commented May 14, 2019

Uh oh!

r-devulap commented May 15, 2019

Uh oh!

r-devulap commented May 15, 2019

Uh oh!

tylerjereddy commented May 15, 2019

Uh oh!

r-devulap commented May 15, 2019

Uh oh!

tylerjereddy commented May 15, 2019

Uh oh!

eric-wieser commented May 15, 2019

Uh oh!

tylerjereddy commented May 15, 2019

Uh oh!

tylerjereddy commented May 15, 2019

Uh oh!

Uh oh!

eric-wieser commented Dec 9, 2018 •

edited

Loading