BUG: fixes for StringDType/unicode promoters #27636

ngoldbaum · Oct 24, 2024

There are a number of missing cases for mixed unicode/string operations that this adds promoters for. Also adds tests for these cases.

Additionally replaces uses of Py_None as an abstract promoter target with PyArray_IntAbstractDType, which makes the promoters fire less offten in unintended cases and is closer to the intention in the code.

Also fixes issues with the python wrappers for the string ufuncs incorrectly selecting the fixed-width string branches for some signatures by relying on np.result_type to check for StringDType inputs.

ngoldbaum · Oct 25, 2024

The latest version of this PR touches a lot of code that @lysnikolaou wrote in the python-level string ufunc wrappers so I'm giving him a ping in case he wants to take a look.

ngoldbaum · Oct 26, 2024

@jorenham if you're at all curious how NumPy handles ufunc type dispatching on the C side, this fixes an issue you spotted while playing with the type stubs.

mhvk

This looks good to me. While looking at the tests and also #27637, I did wonder whether you really want the output to be StringDType in something like replace when what you started with was a U array. But I think it probably is just good to moving towards StringDType by default.

ngoldbaum · Oct 29, 2024

@seberg would you mind taking a look at this? I value your sense of taste for the "right" way to make changes to numpy internals.

seberg

Looks fine to me. I think there might be a whole in the logic for out=, but I am not sure it is worth digging deeper (as opposed to a follow-up, maybe when a bug is reported).

I would love to find a better story for this type of thing... The one story that I could potentially see is to make something like an abstract (doesn't need to be an actual thing could just be a tuple as in isinstance, but conceptually):

class StringOrUnicodeDType:

so that we could just add one promoter, but the promoter would be need to programmatically decide whether the result should be unicode or string.
(Similar to the "this dtype must occur", style of logic. But these ufuncs that include ints show that just a list of dtypes is not super helpful a such)

(Sorry, the changes look good to me, but I am not sure I am in the right state of mind for any inspiration, although I do suspect the tuple idea is about as good as it gets. Together with allowing a promoter to say "do not apply" it would go very far -- if we allow that, a promoter could even register everything, it would just not be friendly)

EDIT: I forgot to say thanks :), this is tricky stuff and looks good, just unfortunately verbose. And verbose isn't fully unintentionally, but it would be good to improve it.

ngoldbaum · Oct 30, 2024

Thanks! I agree there's definitely room for improvement here to make it less verbose. I also noticed while working on this that the use of None in the Unicode promoters makes them fire way too often (e.g. for add and multiply for purely numeric operands). We should fix that too.

I opened a followup issue about this: #27671

I'll go ahead and merge this.

seberg · Dec 9, 2024

@danish-circuit this PR doesn't have anything to do with typing. Please make a new issue with a full reproducer ideally.

BUG: fixes for StringDType/unicode promoters

26cdf63

github-actions bot added the 00 - Bug label Oct 24, 2024

ngoldbaum mentioned this pull request Oct 24, 2024

BUG: Operation spuriously returns a legacy string #27637

Closed

charris added the 09 - Backport-Candidate PRs tagged should be backported label Oct 24, 2024

BUG: fix more issues with string ufunc promotion

87a01ae

ngoldbaum added 2 commits October 26, 2024 13:24

BUG: substantially simplify and fix issue with justification promoter

a81886c

DOC: add release note

7bc49e9

mhvk approved these changes Oct 26, 2024

View reviewed changes

seberg approved these changes Oct 30, 2024

View reviewed changes

ngoldbaum mentioned this pull request Oct 30, 2024

ENH: Add infrastructure to simplify unicode/StringDType promoters #27671

Open

ngoldbaum merged commit 3126b97 into numpy:main Oct 30, 2024
67 checks passed

charris mentioned this pull request Oct 30, 2024

BUG: fixes for StringDType/unicode promoters #27673

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fixes for StringDType/unicode promoters #27636

BUG: fixes for StringDType/unicode promoters #27636

ngoldbaum commented Oct 24, 2024 •

edited

Loading

ngoldbaum commented Oct 25, 2024

ngoldbaum commented Oct 26, 2024

mhvk left a comment

ngoldbaum commented Oct 29, 2024

seberg left a comment •

edited

Loading

ngoldbaum commented Oct 30, 2024

seberg commented Dec 9, 2024

Search code, repositories, users, issues, pull requests...

BUG: fixes for StringDType/unicode promoters #27636

BUG: fixes for StringDType/unicode promoters #27636

Conversation

ngoldbaum commented Oct 24, 2024 • edited Loading

ngoldbaum commented Oct 25, 2024

ngoldbaum commented Oct 26, 2024

mhvk left a comment

Choose a reason for hiding this comment

ngoldbaum commented Oct 29, 2024

seberg left a comment • edited Loading

Choose a reason for hiding this comment

ngoldbaum commented Oct 30, 2024

seberg commented Dec 9, 2024

ngoldbaum commented Oct 24, 2024 •

edited

Loading

seberg left a comment •

edited

Loading