Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: fix another cast setup in array_assign_subscript #27057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

ngoldbaum
Copy link
Member

Thanks for the bug-finding @egesip!

Fixes #27053

@ngoldbaum ngoldbaum added 00 - Bug 09 - Backport-Candidate PRs tagged should be backported component: numpy.strings String dtypes and functions labels Jul 26, 2024
@ngoldbaum
Copy link
Member Author

Hmm, it looks like there are some other code paths that depend on the input and output descriptor being the same...

@ngoldbaum
Copy link
Member Author

I just pushed a hacky fix - I suspect @seberg is going to tell me I'm doing this wrong :)

It wasn't clear to me how exactly this code path is supposed to work, since it seems to depend on the inferred source dtype not being used in some cases.

}
else {
src = self;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below makes sense to me, but this one I don't understand? Legacy should work identically, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it should, in practice lots of tests seem to depend on the old behavior. As I said in the description, I don't fully understand what this code block is doing, it seems to be here to capture oddball conversions (unicode to half goes through here apparently - one of the failing tests I was looking at).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you are using the wrong descr below, and similar tests must fail also for string dtype (maybe not because there is an object special path that uses refcheck).

Check MapIter, descr is passed, it sets up a buffering iterator to keep the copy loop simple, that ensures that the descriptor is identical between src/dst. It is not for your dtype, so you need to fetch the src one from the iterator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing it this way, and it almost works, but the dtype array that lives on the iterator contains references to the dtypes that were passed in to create the iterator, so it still doesn't fix the issue for stringdtype. I tried checking for the case when the descriptor gets replaced and then updated the iterator's descriptor array, but for mysterious reasons that led to crashes.

Maybe the rewrite in the latest push is a bit more appealing? The basic difference between the case with stringdtype and the case that's breaking for other dtypes is for stringdtype the MapIter is allocating an array inline, but for other dtypes the extra_op is already allocated. I added a flag that marks whether or not the iterator allocated an array and used it here to distinguish between the cases.

If not I'd appreciate it if you could try your hand at a fix, I tried for several hours today to get this right but couldn't manage it along the lines that you outlined the other day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better, I agree it's all a bit of a maze although, I want would like to see if we can't untangle it. I might help to make descr (the one used for the extra-op/tmp array) a strong reference.

One thing that would may be better for all paths anyway, may be to use descr = finalize_descr(descr). That way, we don't pollute the assigned arrays side car buffer with temporary copies!
That will mean that we would need to pass _NPY_ARRAY_ENSURE_DTYPE_IDENTITY when creating copies and actually make sure we honor that flag (which we should anyway).

Anyway, need to take another look and it may still be subtly wrong in another pass if REFCHK is false, although I am not sure that is ever relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea to use finalize_descrs when setting up the iterator, I might try my hand at that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I think it may make sense to do it on descr that gets passed in. But not sure where to put it exactly.

@ngoldbaum ngoldbaum force-pushed the fix-array-assign-subscript branch from 37d1dac to 0237342 Compare July 29, 2024 19:24
@ngoldbaum ngoldbaum added this to the 2.1.0 release milestone Aug 1, 2024
@charris
Copy link
Member

charris commented Aug 5, 2024

This is one of the last fixes needed before branching.

@ngoldbaum
Copy link
Member Author

@seberg would you be ok with merging this as-is?

@seberg
Copy link
Member

seberg commented Aug 5, 2024

I suppose so, it would be nice to have some form of TODO to point out that it may still be broken (at least for potential other user DTypes).

But, yeah, if this is the one thing that is blocking 2.1 then fair.

@ngoldbaum ngoldbaum force-pushed the fix-array-assign-subscript branch from 0237342 to 6a0d05c Compare August 5, 2024 20:16
@ngoldbaum
Copy link
Member Author

Added a TODO

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I don't really think its quite right, but approving since it is probably right for the places where it matters.

@ngoldbaum ngoldbaum merged commit 6ed4500 into numpy:main Aug 6, 2024
64 of 66 checks passed
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.strings String dtypes and functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: StringDType Subscript assignment problems
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.