-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Add property-based tests using Hypothesis #15189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(copying over #14440 (comment)) What does Hypothesis do that other tests don't?Computer-generated test cases tend to expose edge cases that hand-written tests don't. Usually they could have been exposed by hand-written tests, but it would take far more effort to do so without tools. For Numpy itself, #10930, #13089, and #14239 have all been found by downstream users of Hypothesis! Working on this pull request, the first test I wrote rediscovered #15127 and highlighted that the bounds to How slow are Hypothesis tests?This mostly depends on how many test cases you run - the default is 100 cases but you can choose any other number on a global or per-test basis. Typically generating data takes between 1% and 20% of the runtime for that test function, though it could be higher if we generate large arrays and apply a fast operation. An important point is that we do not suggest using Hypothesis for everything! Property-based tests tend to be more general but less numerous than traditional unit tests, for a given level of testing. You can easily skip them from certain CI jobs ( Performance is rarely a problem in practice though, so I would ultimately suggest waiting to measure before trying to optimize! Reproducing test failures from CI, users, etc.We have docs about this 😄. To summarize, the test may behave differently on different Python versions or operating systems (but that is true of non-Hypothesis tests too); short of that Hypothesis will print the input example and we can use the I would discourage using the What parts of Numpy would Hypothesis be especially valuable for?Generically, parts of your code which are particularly complex, where you have high-level invariants you can check, or both. I would suggest paying particular attention to IO and serialisation round-trips, the dtype system, and broadcasting - the previous issues mentioned above also provide some examples. @rsokl might also have some ideas! Personally, I like to start simply by trying to call some tricky code using Hypothesis to generate inputs - without asserting anything at all about the result. This often uncovers a more detailed definition of what inputs should be considered valid than has been documented! |
It seems the PR created a test case that raises a
|
@mattip - yep!
I'll be working on HypothesisWorks/hypothesis#2218 for the next day or two, and then back to this 😄 |
This The case that is tripping up the test should have succeeded since
I can reproduce the failure using this bit pattern:
This is probably a bug, xref gh-15127 Edit: specify the |
On the one hand, you might want to claim that gh-15209 is a legitimate cleanup for an edge case. On the other, if using hypothesis means we need to pepper the code with performance-reducing fixes for edge cases no-one has come across in the many years numpy has been around, I am not so sure it is a clear win. |
I'd limit my claim to "it's probably better to discover such issues via Hypothesis than via users" - it's perfectly legitimate to just exclude such edge cases from the test (and ideally document somewhere that such edge cases exist, I guess)! Personally I'd prefer fast code with a note that if I'm dealing with
The I rejected those edge cases precisely because they don't succeed, and I was hoping to demonstrate a passing test in this PR 😅... If they maybe should succeed though, I think that's a good example of the design feedback that naturally falls out of writing property-based tests! |
cfb1ff6
to
707b984
Compare
e10d1ad
to
e76d769
Compare
@mattip - I've rebased on master (which includes #15230), and made the comments much more detailed. The build appears to be failing due to an unrelated Getting the first test in and everything configured for CI will enable contributions from anyone else who has written property-based tests for Numpy, so I'd prefer to add more tests in a follow-up PR 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The np.clip
test is much clearer now, along with the poblems of using scalar nan values (it might be an issue, hard to decide).
Let's see if other reviewers have an opinion about the advantages of using hypothesis at all in numpy in general, and in these two tests in particular.
bebf05d
to
866b0e2
Compare
@mattip - has anyone chimed in on the mailing list? |
I do find the revised version of my property-based tests for I can reiterate that So, I guess I'm just agreeing with the discussion above that it is probably better to at least have a way to semi-automate the discovery of corner cases that we may want to document even if we don't want an outrageous input handling framework on every single function after it gets probed over a large input space by I'm still +1, and one of the lead maintainers of The only person who answered the mailing list query about this in September 2019 was me. Some other concerns that seem to have come up in related PRs:
|
@tylerjereddy - that is how I'd usually write property-based tests, but in this case several of the inputs (dtypes and shapes) depend on each other. It would be trivial to move that part to a custom strategy with
^ exactly this: I'm not saying that Numpy should handle everything, just that it should document what it doesn't. And of course you can constrain Hypothesis' input space to only things which are expected to work; that's the whole point.
Two or three in this PR, and another half-a-dozen downstream that I know of already!
As I've said on the other issues, I don't expect this to be a problem, but if it comes up we can make Hypothesis tests from an as-installed Numpy deterministic; or even take them out of the installed package so they only run in CI.
Yep, I'm happy to help with review and general maintainance of Hypothesis-related open source in general, and as a foundational project Numpy gets a very high priority from me. I know I'm not the only person willing to help either, but there's some frustration about the slow and lukewarm response to issues and pull requests like this one 😕 |
866b0e2
to
289696b
Compare
Yes, adding a pure-Python test-only dependency is perfectly fine, easy for devs and no issue for users. My initial concerns were answered in the other thread.
Much appreciated @Zac-HD. Apologies things move slowly; the initial PR was harder to see the value of, now it's mainly an issue of reviewer bandwidth. Personally, I'm all out of bandwidth for the next 2 weeks; happy for someone else to help get this to completion, and if not then I've added it to the milestone for the next release so we won't forget. |
Co-Authored-By: kitchoi <kit@kychoi.org>
289696b
to
8eb9cfc
Compare
After discussion we decided to put this in, with the caveat that we may choose to ignore edge cases hypothesis testing may generate, and mark them "known failures". This is to avoid a situation where
|
Thanks @Zac-HD |
Sounds like a good plan to me - we've already got two of those in this PR 😄 And my sincere thanks to everyone who has spent valuable time reviewing and discussing this; I know adding a new dependency and style of testing are both tricky decisions and appreciate your patient engagement. But it should be much easier to add tests from now on, or co-develop them with new or refactored features! |
numpy has added hypothesis as a dependency for running property-based tests: numpy/numpy#15189 However, hypothesis is not an input of the derivations. And thus tests have been failing with: ERROR .. - ModuleNotFoundError: No module named 'hypothesis' https://hydra.nixos.org/build/124613306/nixlog/1 This change adds hypothesis to checkInputs, so that tests are run again.
numpy has added hypothesis as a dependency for running property-based tests: numpy/numpy#15189 However, hypothesis is not an input of the derivations. And thus tests have been failing with: ERROR .. - ModuleNotFoundError: No module named 'hypothesis' https://hydra.nixos.org/build/124613306/nixlog/1 This change adds hypothesis to checkInputs, so that tests are run again.
This property-based testing demo focuses on tricky tests for numeric code - Hypothesis adds most value when the input or configuration space is large, but you can make clear statements about "always" or "never" (even very loose ones!).
@mattip @rgommers, I can spend a few days adding tests to this PR if that would be useful. Perhaps more importantly I'm happy to help out with code review / debugging / reproducing bugs or whatever else turns up as a result of property-based testing.
The PR currently consists of:
mininum(maximum(array, clip_low), clip_high)
(contra ENH/DEP: Use a ufunc under the hood for ndarray.clip #12519).In the process I found a mishandling of signalling
nan
s: sign(nan) emits warning #15127 / BUG: do not emit warnings for np.sign, np.equal when using nan #15230!I would prefer to get this merged before working on further tests, both to allow others to contribute property tests and to ensure that I'm working on something valuable for Numpy. Proposed follow-up PRs will cover:
einsum
tests, which found PR #10559 failed to fix einsum (optimize=True) broadcasting bug #10930 (and other tests)np.dtype()
ala ValueError fornp.dtype([("", int), ("f0", int)])
#14239