-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: fix another cast setup in array_assign_subscript #27057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hmm, it looks like there are some other code paths that depend on the input and output descriptor being the same... |
I just pushed a hacky fix - I suspect @seberg is going to tell me I'm doing this wrong :) It wasn't clear to me how exactly this code path is supposed to work, since it seems to depend on the inferred source dtype not being used in some cases. |
numpy/_core/src/multiarray/mapping.c
Outdated
} | ||
else { | ||
src = self; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below makes sense to me, but this one I don't understand? Legacy should work identically, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it should, in practice lots of tests seem to depend on the old behavior. As I said in the description, I don't fully understand what this code block is doing, it seems to be here to capture oddball conversions (unicode to half goes through here apparently - one of the failing tests I was looking at).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case you are using the wrong descr
below, and similar tests must fail also for string dtype (maybe not because there is an object special path that uses refcheck).
Check MapIter, descr
is passed, it sets up a buffering iterator to keep the copy loop simple, that ensures that the descriptor is identical between src/dst. It is not for your dtype, so you need to fetch the src one from the iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing it this way, and it almost works, but the dtype array that lives on the iterator contains references to the dtypes that were passed in to create the iterator, so it still doesn't fix the issue for stringdtype. I tried checking for the case when the descriptor gets replaced and then updated the iterator's descriptor array, but for mysterious reasons that led to crashes.
Maybe the rewrite in the latest push is a bit more appealing? The basic difference between the case with stringdtype and the case that's breaking for other dtypes is for stringdtype the MapIter is allocating an array inline, but for other dtypes the extra_op
is already allocated. I added a flag that marks whether or not the iterator allocated an array and used it here to distinguish between the cases.
If not I'd appreciate it if you could try your hand at a fix, I tried for several hours today to get this right but couldn't manage it along the lines that you outlined the other day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better, I agree it's all a bit of a maze although, I want would like to see if we can't untangle it. I might help to make descr
(the one used for the extra-op/tmp array) a strong reference.
One thing that would may be better for all paths anyway, may be to use descr = finalize_descr(descr)
. That way, we don't pollute the assigned arrays side car buffer with temporary copies!
That will mean that we would need to pass _NPY_ARRAY_ENSURE_DTYPE_IDENTITY
when creating copies and actually make sure we honor that flag (which we should anyway).
Anyway, need to take another look and it may still be subtly wrong in another pass if REFCHK
is false, although I am not sure that is ever relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea to use finalize_descrs when setting up the iterator, I might try my hand at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I think it may make sense to do it on descr
that gets passed in. But not sure where to put it exactly.
37d1dac
to
0237342
Compare
This is one of the last fixes needed before branching. |
@seberg would you be ok with merging this as-is? |
I suppose so, it would be nice to have some form of TODO to point out that it may still be broken (at least for potential other user DTypes). But, yeah, if this is the one thing that is blocking 2.1 then fair. |
0237342
to
6a0d05c
Compare
Added a TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I don't really think its quite right, but approving since it is probably right for the places where it matters.
Thanks for the bug-finding @egesip!
Fixes #27053