Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: safer bincount casting #28355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 3, 2025
Merged

Conversation

tylerjereddy
Copy link
Contributor

@tylerjereddy
Copy link
Contributor Author

The CI failures for the new test case I added are real; npy_intp gets used in other places in the control flow as well, so patching locally to account for more explicit type casts alone isn't sufficient.

@tylerjereddy
Copy link
Contributor Author

Briefly discussed at triage meeting this morning--Sebastian suggested forcing a cast to npy_intp, even for uint64 input, since indexing does that as well and it isn't really practical to guard against the unreasonably large ints that would wrap to negative values.

numpy/_core/src/multiarray/compiled_base.c Show resolved Hide resolved
* Fixes numpy#28354

* Force usage of `npy_intp` type in `np.bincount()` and avoid
unsafe casting errors with i.e., `npy_uint64`. This is similar
to our behavior with indexing.
* `arr_bincount()` now only uses unsafe casting for integer
input types, and the number of casting operations has
been reduced for the code path used in above PR.

* a test for non-crashing behavior with non-contiguous
`bincount()` input has been added.
@tylerjereddy
Copy link
Contributor Author

We shouldn't force-cast everything.

Ok, I added a guard for that.

PyArray_FromAny needs the flags to be passed because the later code relies on them..

I couldn't replicate the need for this because what I did originally was two casting operations instead of one--the Any cast was added in the control flow to precede the PyArray_ContiguousFromAny after it, mostly because I was being lazy about constructing the appropriate requirements/flags. In any case, I've simplified things to restore to a single casting operation now, with custom flags depending on input type, which I suspect is closer to what you'd want?

The current version should crash hard if you do something like np.bincount(np.arange(10000)[::2])

I couldn't find any evidence of a hard crash with that even when I intentionally made things worse with regards to C contiguity, though it may require i.e., pytest-repeat and some patience. In any case, I've added that test case as a "doesn't segfault" test just in case.

tylerjereddy added a commit to tylerjereddy/numpy that referenced this pull request Feb 21, 2025
* The actual C function signature is `PyArray_Size(PyObject *op)`,
and the compiler will issue a warning if you follow the current
C API docs that specify `PyArrayObject*`, so clean this up. I noticed
this while working on numpygh-28355, but it was also mentioned as an aside
7 years ago in numpygh-10977.

[skip azp] [skip cirrus] [skip actions]
seberg pushed a commit that referenced this pull request Feb 24, 2025
* The actual C function signature is `PyArray_Size(PyObject *op)`,
and the compiler will issue a warning if you follow the current
C API docs that specify `PyArrayObject*`, so clean this up. I noticed
this while working on gh-28355, but it was also mentioned as an aside
7 years ago in gh-10977.

[skip azp] [skip cirrus] [skip actions]
Comment on lines 181 to 182
flags = NPY_ARRAY_WRITEABLE | NPY_ARRAY_ALIGNED | NPY_ARRAY_C_CONTIGUOUS;
if (PyArray_Size((PyObject *)list) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation makes this pretty messy, almost wondering if we should just finalize it...

In either case, I think the PyArray_Size() here ends up as an imprecise PyArray_Check() here (as it misses an empty uint8 array for example)?

I am not quite sure that not setting force-cast in the non-array case won't finalize the deprecation (in small parts).
Since both paths use PyArray_ContiguousFromAny, I think that is fine. Although, it may be that we are not relaxing allowing uint64 for an array-like here i.e. something like:

class a:
    def __array__(self):
        return np.ones(10, dtype=np.uint64)

@@ -119,6 +119,7 @@ arr_bincount(PyObject *NPY_UNUSED(self), PyObject *const *args,
npy_intp *numbers, *ians, len, mx, mn, ans_size;
npy_intp minlength = 0;
npy_intp i;
int flags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would move this down into the if, but OK.

I think we can put this in, if we change things to a PyArray_Check below, unless I am missing something. It seems fine, even if it might not be 100% for array-likes.

I would lean towards not backporting. It has been around forever, and the reason it came up was just cupy testing against NumPy as far as I am aware.

* Based on reviewer feedback, narrow the scope of the `flags`
variable in `arr_bincount()`.

* Based on reviewer feedback, add an array-like test for the `uint64`
casting issue, which indeed fails before and passes after adding
a similar shim to the array-like code path in `arr_bincount()`.

* Based on reviewer feedback, switch the patching from `PyArray_Size`
to `PyArray_Check` in a few places.
@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Feb 28, 2025

@seberg I think I've addressed most of the comments in the latest commit (see commit message).

Covering the array-like case does add a bit of code duplication now I suppose. I didn't follow the empty uint8 worry, but switched to PyArray_Check in any case since I believe I was indeed guessing a bit with Size (now in two places, though didn't check we really need it in both).

Note that the test for array-like did fail before/pass after the additional coercion shim just added (I also noticed that our error message for passing in a raw array-like class, rather than instance, to bincount seems terrible, but that's probably another matter).

I don't think I followed the deprecation-related concern--I thought I was just coercing for integer type inputs and it was the non-integer cases that were deprecated. It is a bit confusing though, because the deprecation message itself seems simple while the commented discussion a bit higher up seems to obfuscate things...

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, duplicating this seems indeed like the easiest path.

numpy/_core/src/multiarray/compiled_base.c Outdated Show resolved Hide resolved
@seberg seberg merged commit d014c16 into numpy:main Mar 3, 2025
65 of 66 checks passed
@seberg
Copy link
Member

seberg commented Mar 3, 2025

FWIW, I think this has been around forever and the issue was just a cupy test, I think. So I don't care much for backporting, but I also don't think there is a problem with that..

@jakirkham
Copy link
Contributor

Thanks Tyler and Sebastian! 🙏

@tylerjereddy tylerjereddy deleted the treddy_issue_28354 branch March 3, 2025 16:50
jakirkham pushed a commit to jakirkham/numpy that referenced this pull request Mar 3, 2025
* BUG: safer bincount casting

* Fixes numpy#28354

* Force usage of `npy_intp` type in `np.bincount()` and avoid
unsafe casting errors with i.e., `npy_uint64`. This is similar
to our behavior with indexing.

* MAINT, BUG: PR 28355 revisions

* `arr_bincount()` now only uses unsafe casting for integer
input types, and the number of casting operations has
been reduced for the code path used in above PR.

* a test for non-crashing behavior with non-contiguous
`bincount()` input has been added.

* MAINT, BUG: PR 28355 revisions

* Based on reviewer feedback, narrow the scope of the `flags`
variable in `arr_bincount()`.

* Based on reviewer feedback, add an array-like test for the `uint64`
casting issue, which indeed fails before and passes after adding
a similar shim to the array-like code path in `arr_bincount()`.

* Based on reviewer feedback, switch the patching from `PyArray_Size`
to `PyArray_Check` in a few places.

* Update numpy/_core/src/multiarray/compiled_base.c

---------

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
@jakirkham
Copy link
Contributor

As 2.2.x looks pretty active, went ahead and created a backport to 2.2.x with PR: #28420

It looks like versions older than that are not seeing much activity. So didn't go further. Please let me know whether that would be worthwhile

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.bincount casting uint64 to int64
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.