Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: Add property-based tests using Hypothesis #15189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 13, 2020

Conversation

Zac-HD
Copy link
Contributor

@Zac-HD Zac-HD commented Dec 27, 2019

This property-based testing demo focuses on tricky tests for numeric code - Hypothesis adds most value when the input or configuration space is large, but you can make clear statements about "always" or "never" (even very loose ones!).

@mattip @rgommers, I can spend a few days adding tests to this PR if that would be useful. Perhaps more importantly I'm happy to help out with code review / debugging / reproducing bugs or whatever else turns up as a result of property-based testing.

The PR currently consists of:

I would prefer to get this merged before working on further tests, both to allow others to contribute property tests and to ensure that I'm working on something valuable for Numpy. Proposed follow-up PRs will cover:

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 27, 2019

(copying over #14440 (comment))

What does Hypothesis do that other tests don't?

Computer-generated test cases tend to expose edge cases that hand-written tests don't. Usually they could have been exposed by hand-written tests, but it would take far more effort to do so without tools.

For Numpy itself, #10930, #13089, and #14239 have all been found by downstream users of Hypothesis!

Working on this pull request, the first test I wrote rediscovered #15127 and highlighted that the bounds to np.clip may not be scalar nans - either a docs issue or a bug.

How slow are Hypothesis tests?

This mostly depends on how many test cases you run - the default is 100 cases but you can choose any other number on a global or per-test basis. Typically generating data takes between 1% and 20% of the runtime for that test function, though it could be higher if we generate large arrays and apply a fast operation.

An important point is that we do not suggest using Hypothesis for everything! Property-based tests tend to be more general but less numerous than traditional unit tests, for a given level of testing. You can easily skip them from certain CI jobs (pytest -m "not hypothesis") if that becomes an issue, or use a setting profile that runs fewer examples in CI.

Performance is rarely a problem in practice though, so I would ultimately suggest waiting to measure before trying to optimize!

Reproducing test failures from CI, users, etc.

We have docs about this 😄. To summarize, the test may behave differently on different Python versions or operating systems (but that is true of non-Hypothesis tests too); short of that Hypothesis will print the input example and we can use the print_blob=True setting to be sure of reproducing the failure. Changes between Hypothesis versions are uncommon but not unknown; which is why the print_blob setting dumps the exact Hypothesis version too.

I would discourage using the @seed decorator or derandomize=True setting, as the net effect is to reduce the number of bugs found. We (Hypothesis developers) spend a lot of time ensuring that bugs can be reproduced once found, and I'd personally be happy to help if the Numpy project is having any trouble with Hypothesis.

What parts of Numpy would Hypothesis be especially valuable for?

Generically, parts of your code which are particularly complex, where you have high-level invariants you can check, or both. I would suggest paying particular attention to IO and serialisation round-trips, the dtype system, and broadcasting - the previous issues mentioned above also provide some examples. @rsokl might also have some ideas!

Personally, I like to start simply by trying to call some tricky code using Hypothesis to generate inputs - without asserting anything at all about the result. This often uncovers a more detailed definition of what inputs should be considered valid than has been documented!

@mattip
Copy link
Member

mattip commented Dec 30, 2019

It seems the PR created a test case that raises a DeprecationWarning:

...\numpy\core\tests\test_numeric.py:2063: in test_clip_property
    assert_array_equal(result, expected)
...\numpy\testing\_private\utils.py:728: in func_assert_same_pos
    x_id = func(x)

inf        = inf
xy         = array([nan, nan])

...\numpy\testing\_private\utils.py:774: DeprecationWarning
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_clip_property(
    self=<numpy.core.tests.test_numeric.TestClip at 0x222966645f8>,
    data=data(...),
)
Draw 1 (shape): (2,)
Draw 2: dtype('float64')
Draw 3: array([nan, nan])
Draw 4: BroadcastableShapes(input_shapes=((1,), (1,)), result_shape=(2,))
Draw 5: 0.0
Draw 6: 0.0

You can reproduce this example by temporarily adding \
@reproduce_failure('4.56.3', b'AXicY2RkYGRgYmBg+P+BAQQYGfACAEm1AfY=') \
as a decorator on your test case

test_requirements.txt Outdated Show resolved Hide resolved
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 31, 2019

@mattip - yep!

  • Should I just skip inputs that raise this warning?
  • Is clip expected to work with arguments that are not of the same dtype as the array? What about arguments to which array elements could safely be cast?
  • Is clip expected to work with zero-dimensional arrays as arguments? If so, I need to file a bug; if not I'd like to explain why in a comment.

I'll be working on HypothesisWorks/hypothesis#2218 for the next day or two, and then back to this 😄

@mattip mattip mentioned this pull request Dec 31, 2019
@mattip
Copy link
Member

mattip commented Dec 31, 2019

This np.clip test is far from "just add a decorator with a strategy and run the tests like normal". There is a lot going on here: drawing various dtypes and shapes for arr, amax, amin; rejecting edge cases that maybe should succeed.

The case that is tripping up the test should have succeeded since arr, result, and expected are all np.array([nan, nan]) (for the record amin=0.0, amax=0.0). It seems something is strange about the nan which trips up assert_array_equal:

>>> np.unpackbits(arr[:1].view('uint8'))
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
      dtype=uint8)
>>> np.unpackbits(np.array([np.nan]).view('uint8'))
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
      dtype=uint8)

I can reproduce the failure using this bit pattern:

a = np.packbits([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
b = a.view(np.float64)
np.isnan(b)
# array([True])
b == np.inf
# RuntimeWarning: invalid value encountered in equal

This is probably a bug, xref gh-15127

Edit: specify the np.clip test in the first few words

@mattip
Copy link
Member

mattip commented Dec 31, 2019

On the one hand, you might want to claim that gh-15209 is a legitimate cleanup for an edge case. On the other, if using hypothesis means we need to pepper the code with performance-reducing fixes for edge cases no-one has come across in the many years numpy has been around, I am not so sure it is a clear win.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Dec 31, 2019

I'd limit my claim to "it's probably better to discover such issues via Hypothesis than via users" - it's perfectly legitimate to just exclude such edge cases from the test (and ideally document somewhere that such edge cases exist, I guess)!

Personally I'd prefer fast code with a note that if I'm dealing with nans I might want such-and-such a recipe, but without this test we wouldn't even know we were making that tradeoff.

This np.clip test is far from "just add a decorator with a strategy and run the tests like normal". ... rejecting edge cases that maybe should succeed.

The clip test is quite a lot more complicated, because the arguments leading in to the last few lines have some complicated relationships - we have an input array with shape and dtype, and two bounds which are of the same dtype and either scalars or of a broadcastable shape! The tests I based them on are simpler, but don't explore the broadcasting or shape issues.

I rejected those edge cases precisely because they don't succeed, and I was hoping to demonstrate a passing test in this PR 😅... If they maybe should succeed though, I think that's a good example of the design feedback that naturally falls out of writing property-based tests!

@Zac-HD Zac-HD force-pushed the property-tests branch 2 times, most recently from cfb1ff6 to 707b984 Compare January 4, 2020 01:49
@charris charris changed the title Add property-based tests using Hypothesis ENH: Add property-based tests using Hypothesis Jan 8, 2020
@Zac-HD Zac-HD force-pushed the property-tests branch 2 times, most recently from e10d1ad to e76d769 Compare January 9, 2020 07:54
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 9, 2020

@mattip - I've rebased on master (which includes #15230), and made the comments much more detailed.

The build appears to be failing due to an unrelated mingw issue; assuming that's fixed is there anything else I need to do before this can be merged?

Getting the first test in and everything configured for CI will enable contributions from anyone else who has written property-based tests for Numpy, so I'd prefer to add more tests in a follow-up PR 🙂

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The np.clip test is much clearer now, along with the poblems of using scalar nan values (it might be an issue, hard to decide).

Let's see if other reviewers have an opinion about the advantages of using hypothesis at all in numpy in general, and in these two tests in particular.

numpy/core/tests/test_arrayprint.py Show resolved Hide resolved
numpy/core/tests/test_numeric.py Show resolved Hide resolved
@Zac-HD Zac-HD force-pushed the property-tests branch 2 times, most recently from bebf05d to 866b0e2 Compare January 17, 2020 00:47
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 17, 2020

@mattip - has anyone chimed in on the mailing list?

@tylerjereddy
Copy link
Contributor

I do find the revised version of my property-based tests for clip a bit harder to read because of all the extra hypothesis machinery being used inside the test function proper. Perhaps that's because I like thinking about hypothesis conceptually as an intelligent version of pytest parametrization, where the case generation typically happens before the def test_something():.

I can reiterate that hypothesis was useful for evaluating Eric's clip ufunc PR, and indeed I'm pretty sure some documentation adjustments were made as a result of corner cases that i.e., Marten & Eric didn't want to slow things down for. I think it was as simple as adding No check is performed to ensure a_min < a_max.

So, I guess I'm just agreeing with the discussion above that it is probably better to at least have a way to semi-automate the discovery of corner cases that we may want to document even if we don't want an outrageous input handling framework on every single function after it gets probed over a large input space by hypothesis. Realistically, I think we're going to find bugs that core devs will occasionally want to fix by this means though.

I'm still +1, and one of the lead maintainers of hypothesis seems open to helping review/maintain the tests. Are there substantial ecosystem concerns with a new testing dependency? IIRC, @rgommers didn't seem too worried the last time this came up, since it is just on the testing side?

The only person who answered the mailing list query about this in September 2019 was me.

Some other concerns that seem to have come up in related PRs:

  • documenting & perhaps discouraging usage outside of CI in routine cases so that we don't get increased volume of issue reports related to hypothesis "case variance" in different environments
  • extra work for core devs reviewing PRs (yet another testing "framework" to learn)---perhaps offset by Zac helping out, at least initially, with that

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jan 23, 2020

I do find the revised version of my property-based tests for clip a bit harder to read because of all the extra hypothesis machinery being used inside the test function proper. Perhaps that's because I like thinking about hypothesis conceptually as an intelligent version of pytest parametrization, where the case generation typically happens before the def test_something():.

@tylerjereddy - that is how I'd usually write property-based tests, but in this case several of the inputs (dtypes and shapes) depend on each other.

It would be trivial to move that part to a custom strategy with @st.composite, but unless I know I need the same customised data for multiple tests, I prefer the inline st.data() style - it's trivial to go the other way if we add a second test later.

So, I guess I'm just agreeing with the discussion above that it is probably better to at least have a way to semi-automate the discovery of corner cases that we may want to document even if we don't want an outrageous input handling framework on every single function after it gets probed over a large input space by hypothesis.

^ exactly this: I'm not saying that Numpy should handle everything, just that it should document what it doesn't. And of course you can constrain Hypothesis' input space to only things which are expected to work; that's the whole point.

Realistically, I think we're going to find bugs that core devs will occasionally want to fix by this means though.

Two or three in this PR, and another half-a-dozen downstream that I know of already!

Are there substantial ecosystem concerns with a new testing dependency? ... perhaps discouraging usage outside of CI in routine cases so that we don't get increased volume of issue reports related to hypothesis "case variance" in different environments

As I've said on the other issues, I don't expect this to be a problem, but if it comes up we can make Hypothesis tests from an as-installed Numpy deterministic; or even take them out of the installed package so they only run in CI.

I'm still +1, and one of the lead maintainers of hypothesis seems open to helping review/maintain the tests.

Yep, I'm happy to help with review and general maintainance of Hypothesis-related open source in general, and as a foundational project Numpy gets a very high priority from me. I know I'm not the only person willing to help either, but there's some frustration about the slow and lukewarm response to issues and pull requests like this one 😕

@rgommers rgommers added this to the 1.19.0 release milestone Jan 24, 2020
@rgommers
Copy link
Member

IIRC, @rgommers didn't seem too worried the last time this came up, since it is just on the testing side?

Yes, adding a pure-Python test-only dependency is perfectly fine, easy for devs and no issue for users. My initial concerns were answered in the other thread.

Yep, I'm happy to help with review and general maintainance of Hypothesis-related open source in general, and as a foundational project Numpy gets a very high priority from me. I know I'm not the only person willing to help either, but there's some frustration about the slow and lukewarm response to issues and pull requests like this one confused

Much appreciated @Zac-HD. Apologies things move slowly; the initial PR was harder to see the value of, now it's mainly an issue of reviewer bandwidth. Personally, I'm all out of bandwidth for the next 2 weeks; happy for someone else to help get this to completion, and if not then I've added it to the milestone for the next release so we won't forget.

@mattip
Copy link
Member

mattip commented Feb 13, 2020

After discussion we decided to put this in, with the caveat that we may choose to ignore edge cases hypothesis testing may generate, and mark them "known failures". This is to avoid a situation where

  • hypothesis comes up with a valid bug
  • fixing it would require significant effort or break other things
  • no user has reported the issue, so it probably does not come up in real-world use

@mattip mattip merged commit abdd996 into numpy:master Feb 13, 2020
@mattip
Copy link
Member

mattip commented Feb 13, 2020

Thanks @Zac-HD

@Zac-HD Zac-HD deleted the property-tests branch February 13, 2020 07:42
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Feb 13, 2020

Sounds like a good plan to me - we've already got two of those in this PR 😄

And my sincere thanks to everyone who has spent valuable time reviewing and discussing this; I know adding a new dependency and style of testing are both tricky decisions and appreciate your patient engagement. But it should be much easier to add tests from now on, or co-develop them with new or refactored features!

danieldk added a commit to danieldk/nixpkgs that referenced this pull request Aug 13, 2020
numpy has added hypothesis as a dependency for running property-based
tests:

numpy/numpy#15189

However, hypothesis is not an input of the derivations. And thus tests
have been failing with:

ERROR .. - ModuleNotFoundError: No module named 'hypothesis'

https://hydra.nixos.org/build/124613306/nixlog/1

This change adds hypothesis to checkInputs, so that tests are run
again.
FRidh pushed a commit to NixOS/nixpkgs that referenced this pull request Aug 15, 2020
numpy has added hypothesis as a dependency for running property-based
tests:

numpy/numpy#15189

However, hypothesis is not an input of the derivations. And thus tests
have been failing with:

ERROR .. - ModuleNotFoundError: No module named 'hypothesis'

https://hydra.nixos.org/build/124613306/nixlog/1

This change adds hypothesis to checkInputs, so that tests are run
again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.