Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH/DEP: Use a ufunc under the hood for ndarray.clip #12519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 15, 2019

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Dec 9, 2018

This includes:

  • The addition of 3-input PyObject inner loop
  • The removal of ->f->fastclip for builtin types, which now use ufuncs instead
  • A deprecation in PyArray_Clip for third-party types that still use ->f->fastclip
  • A deprecation of the unusual casting behavior of clip
  • A deprecation of the broken nan-behavior of clip, which was previously dependent on dimensionality and byte-order.
  • new: A behavior change for max < min which brings object arrays in line with the other types

Surprisingly, this actually seems to have better performance (N=10000):

In [1]: a = np.arange(10000, dtype=float)
In [2]: a2 = np.empty(a.shape, dtype=[('a', a.dtype), ('b', np.bool)])['a']
In [3]: a2[...] = a

In [4]: %timeit np.clip(a, 0, 1)
20.3 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: %timeit np.core.umath.clip(a, 0, 1)
16.9 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [6]: %timeit np.clip(a2, 0, 1)
65.7 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit np.core.umath.clip(a2, 0, 1)
62.6 µs ± 970 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Edit: those performance numbers are probably not representative of the wrapped call - there is a big cost to pay for the deprecations.

numpy/core/_methods.py Outdated Show resolved Hide resolved
numpy/core/src/umath/loops.c.src Outdated Show resolved Hide resolved
@mattip
Copy link
Member

mattip commented Dec 9, 2018

if you have a ready benchmark, perhaps you could add it?

numpy/core/src/umath/loops.c.src Outdated Show resolved Hide resolved
npy_intp n = dimensions[0];
npy_intp i;

/* contiguous, branch to let the compiler optimize */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just enough optimization here to make um.clip faster than np.clip in the cases that are common.

numpy/core/src/umath/loops.c.src Outdated Show resolved Hide resolved
@eric-wieser
Copy link
Member Author

@mattip: Benchmark added. I don't know how much energy I have to put into this, but this looks like the lowest-hanging fruit of #12514. Feel free to take over this PR as your own, if you want to.

@eric-wieser
Copy link
Member Author

Outdated but non-resolved comments above still apply

numpy/core/code_generators/generate_umath.py Outdated Show resolved Hide resolved
if min is None and max is None:
raise ValueError("One of max or min must be given")
elif min is None:
return um.clip_above(a, max, out=out, casting=casting)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use um.minimum here? (and um.maximum instead of clip_below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - clip needs to preserve NaNs in and only in the first argument. maximum always propagates, and fmax always ignores. So we need a third maximum-like function

'clip',
)

if name[0] != '_' and name not in skip:
# matmul is special, it does not use the OUT_SCALAR replacement strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to adjust the comment here on why clip is excluded; though perhaps better to put it behind each item in skip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you still move this comment up to the definition of skip?

@mhvk
Copy link
Contributor

mhvk commented Dec 19, 2018

Taking the conversation of using minimum and maximum for clip outside of the in-line comments (since I missed the first one...): is it obvious what the behaviour should be? I think that if min or max is NaN, it is not strange to just pass that on. Oddly, that is partially what the current clip does:

a = np.array([10., np.nan, np.nan, -10.])
a.clip(np.array([np.nan, np.nan, 1., 1.]), np.array([2., 2., np.nan, np.nan]))
# RuntimeWarning: invalid value encountered in minimum
# RuntimeWarning: invalid value encountered in maximum
# array([nan, nan, nan, nan])

Looking at the code, I see this is because for array-like min and max, the _slow_array_clip path is chosen, which in fact uses the minimum and maximum ufuncs!

My sense would be to keep this behaviour, following the maxim that in case of doubt one should refuse the temptation to guess: the user should be ±np.inf if a particular element should not have a minimum or maximum.

Note that if we felt it was needed, we could keep this completely backwards compatible if in the wrapping code we checked for cases of scalar min/max=np.nan and treated those as equivalent None and then also raised a DeprecationWarning.

@eric-wieser
Copy link
Member Author

is it obvious what the behaviour should be

Not really - I just noticed that in the past someone complained when the nan was propagated (#7601)

the user should be ±np.inf if a particular element should not have a minimum or maximum.

That's a fairly strong argument I hadn't thought of as to why the nan behavior is useless. Unfortunately the replacement is quite a bit more verbose.

np.clip(arr, np.where(np.isnan(a_min), -np.inf, a_min), np.where(np.isnan(a_max), np.inf, a_max))

@mhvk
Copy link
Contributor

mhvk commented Dec 21, 2018

@eric-wieser - #7601 complained that nan in the array was not propagated on some windows virtual machine! I think we agree that no matter what, nan in the input array should be propagated (i.e., we will not use fmin, fmax behaviour). The question is what to do with nan in min/max.

Here, remember that at present nan do get propagated if min or max are arrays (since at present minimum and maximum are used for that case). So, the question is whether we want to extend the scalar behaviour of ignoring min,max=nan to arrays or extend the array behaviour of propagating min,max=nan to scalars (possibly with deprecation).

My own sense is to always propagate nan for the reasons I gave earlier; it also may be the least likely to lead to regressions, since having a nan in an array of min or max seems much more likely than passing in a nan scalar for min or max. (But then, scalars are surely used much more often... Which perhaps argues for deprecation of the nan scalar case....)

@eric-wieser
Copy link
Member Author

Given that the current nan behavior is inconsistent and depends on things like the byte order, I think we just drop this without a deprecation warning:

>>> np.array(1.0).clip(np.nan, np.nan)
1.0
>>> np.array(1.0).newbyteorder('S').clip(np.nan, np.nan)
RuntimeWarning: invalid value encountered in minimum
RuntimeWarning: invalid value encountered in maximum
nan

@mhvk
Copy link
Contributor

mhvk commented Feb 4, 2019

OK, makes sense to just have it in the release notes. So, we change to always behave consistently with mininum(maximum(array, clip_low), clip_high)?

@eric-wieser eric-wieser force-pushed the clip-ufunc branch 2 times, most recently from 1a30542 to add2e0c Compare May 11, 2019 22:25
@charris
Copy link
Member

charris commented May 12, 2019

Needs rebase. Test failure is fixed in master.

@charris
Copy link
Member

charris commented May 12, 2019

Needs rebase. Test error is fixed in master.

eric-wieser and others added 4 commits May 14, 2019 13:52
This includes:
* The addition of 3-input PyObject inner loop
* The removal of `->f->fastclip` for builtin types, which now use ufuncs instead
* A deprecation in `PyArray_Clip` for third-party types that still use `->f->fastclip`
* A deprecation of the unusual casting behavior of `clip`
* A deprecation of the broken `nan`-behavior of `clip`, which was previously dependent on dimensionality and byte-order.
* Unit test for object clip issue

* parametrized test_simple_int32_inout() to
additionally handle scenario where "unsafe"
casting is explicitly passed through, as
requested by reviewer

* add a requested comment related to possibility
of future nan check optimization for `@name@_clip`

* add a unit test to cover the case where np.clip()
is called with None for both max and min

* add a unit test for the case where out is None
and casting is specified as an invalid value, to flush
through code path in TestClip.fastclip

* add unit test + doc update for expected behavior
when amin > amax

* add unit test + patch for bug in npy_ObjectClip; its
operations and operands were wrong; the unit test case
is based on a scenario probed by the hypothesis library

* add unit test for error in timedelta64 MAX
function for clip handling

* add unit test for a pathological case where np.ones(10)
is clipped with amin=1, amax=0
This changes the object array behavior to match the other behavior
@tylerjereddy
Copy link
Contributor

rebased / force pushed -- if CI is all green I plan to merge as noted above

@tylerjereddy
Copy link
Contributor

The CI failures are from things like numpy/core/src/umath/simd.inc.src:1407:1: warning: ‘AVX512F_log_FLOAT’ defined but not used [-Wunused-function].

This is presumably coming in from work by @r-devulap and @juliantaylor?

Hopefully no merge conflict in release notes arises while waiting for this to get resolved / rebased.

@r-devulap
Copy link
Member

I think the reason you are seeing that error is the file simd.inc is included in fast_loop_macros.h which is then included in clip.c.src. simd.inc contains the code for static functions AVX2_log_FLOAT etc which ends up not being used anywhere in clip.c.src and hence the error. inlining the ISA_exp/log_FLOAT functions with NPY_INLINE keyword solves the build issue you are seeing.

@r-devulap
Copy link
Member

Is there a reason why simd.inc is included in fast_loop_macros.h?

@tylerjereddy
Copy link
Contributor

Perhaps we can make that refinement if the inclusion is not needed @eric-wieser ?

@r-devulap
Copy link
Member

I was able to build without #include<simd.inc> in fast_loop_macros.h, so I am not sure if that include is needed.

@tylerjereddy
Copy link
Contributor

Thanks, that's helpful feedback! Eric or Marten can probably confirm the safety of removal, but sounds straightforward.

@eric-wieser
Copy link
Member Author

I was under the impression that some of the macros in that file expand to things from the simd header.

I think adding inline is the right way to go here.

* ISA_exp/log have been inlined to prevent
build errors when included in PR 12519
@tylerjereddy
Copy link
Contributor

Ok, I inlined the low-level exp / log stuff & local test suite results are unchanged while the build warnings-turned-errors disappear. Hopefully CI agrees.

@tylerjereddy
Copy link
Contributor

Thanks Eric and all reviewers. Merging with fingers crossed based on various discussions above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.