-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Fix min_pos when all negative + speed up #19328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, otherwise LGTM
sklearn/utils/arrayfuncs.pyx
Outdated
""" | ||
Find the minimum value of an array over positive values | ||
|
||
Returns a huge value if none of the values are positive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you are copying this over from before, but this can be slightly improved:
Returns a huge value if none of the values are positive | |
Returns maximum representable value of the input dtype if none of the values are positive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done.
Just realized the previous indentation was 3 spaces 0o :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good thanks!
When all input elements are <= 0, min_pos returns the max float (resp. double) if input is float (resp.double). However, the max values are switched and it's actually returning the max double for float input and the max float for double input. Unlikely that it already cause an error in practice but that would have been a sneaky bug :)
I added a test that fails on main. I also added a generic test since the function was not tested at all.
I took the opportunity to fuse type the function.
I also noticed that the call to X.dtype.name is not free, whereas X.dtype is. When the inputs are rather small arrays, for instance in dict_learning they have shape ~ (n_components,), the computation can be completely dominated by the name finding:
Turns out it was taking a significant proportion of the time spent in the sparse coding.