-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
API replace mean_squared_error(square=False) by root_mean_squared_error #26734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR @101AlexMartin. I'm not opposed to have 2 distinct functions for mse and rmse if it makes it easier to find and reduce the confusion. Regarding the code, you should just call |
I'd deprecate the square parameter. That's why I didn't call to mse inside the rmse function, but duplicated it instead. PD.: If you want to deprecate, can you please let me know how is that implemented in the code? |
This is also confusing to have 2 things that do the same. I prefer to deprecate the For the deprecation, you can refer to https://scikit-learn.org/dev/developers/contributing.html#deprecation |
sklearn/metrics/_regression.py
Outdated
y_type, y_true, y_pred, multioutput = _check_reg_targets( | ||
y_true, y_pred, multioutput | ||
) | ||
check_consistent_length(y_true, y_pred, sample_weight) | ||
output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)) | ||
|
||
if isinstance(multioutput, str): | ||
if multioutput == "raw_values": | ||
return output_errors | ||
elif multioutput == "uniform_average": | ||
# pass None as weights to np.average: uniform mean | ||
multioutput = None | ||
|
||
return np.average(output_errors, weights=multioutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y_type, y_true, y_pred, multioutput = _check_reg_targets( | |
y_true, y_pred, multioutput | |
) | |
check_consistent_length(y_true, y_pred, sample_weight) | |
output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight)) | |
if isinstance(multioutput, str): | |
if multioutput == "raw_values": | |
return output_errors | |
elif multioutput == "uniform_average": | |
# pass None as weights to np.average: uniform mean | |
multioutput = None | |
return np.average(output_errors, weights=multioutput) | |
output_errors = mean_squared_error( | |
y_true, | |
y_pred, | |
sample_weight=sample_weight, | |
multioutput="raw_values", | |
) | |
if isinstance(multioutput, str): | |
if multioutput == "raw_values": | |
return output_errors | |
elif multioutput == "uniform_average": | |
# pass None as weights to np.average: uniform mean | |
multioutput = None | |
return np.average(output_errors, weights=multioutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In here you are missing the square root right?
Oups indeed 😁 I just wanted to show the pattern to reuse the code of mean squared errorSent from my iPhoneOn 30 Jun 2023, at 23:16, 101AlexMartin ***@***.***> wrote:
@101AlexMartin commented on this pull request.
In sklearn/metrics/_regression.py:
+ y_type, y_true, y_pred, multioutput = _check_reg_targets(
+ y_true, y_pred, multioutput
+ )
+ check_consistent_length(y_true, y_pred, sample_weight)
+ output_errors = np.sqrt(np.average((y_true - y_pred) ** 2, axis=0, weights=sample_weight))
+
+ if isinstance(multioutput, str):
+ if multioutput == "raw_values":
+ return output_errors
+ elif multioutput == "uniform_average":
+ # pass None as weights to np.average: uniform mean
+ multioutput = None
+
+ return np.average(output_errors, weights=multioutput)
In here you are missing the square root right?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Can you please check the version numbers of deprecation? |
I'll do the same changes to the MSLE to be consistent. I'll create a new ad-hoc function to calculate RMSLE |
I also ensured there are no leftover references to the parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments. The target version for the deprecation cycle are 1.4 (start) and 1.6 (end).
To fix the linting issues you need to run black and ruff, or you can install pre-commit, as explained here https://scikit-learn.org/dev/developers/contributing.html#how-to-contribute.
Please also add an entry in the v1.4.rst what's new, and add the 2 new functions in doc/modules/classes.rst
.
After installing the pre-commit I'm no longer allowed to push the changes, maybe you can give me a hand.
|
The error is not very informative. Hard to guess what's going on with that. |
Done. Let me know what's the issue, I'm curious about this pre-commit tool. |
There's a summary of the linting issues in this comment #26734 (comment). Looks like there's only 1 import sorting issue. I don't know what's wrong with your local mypy. The min version we support is 1.3. Another possibility is that you just installed a very recent version of mypy that complains where it did not before. |
I can try to install a different version of mypy. However, since changes are already commited, I'm not sure on how to solve this problem. Any suggestion? |
Our CI is using mypy==1.3. You can install that version. There's nothing more you need to do. Looks like you already pushed your latest changes. You can commit new stuff and push the same way |
But there's no more I had to commit for now, all the changes you asked for were implemented. I cannot thus solve the linting issue for now, right? |
to fix the linting issue you need to run |
@jeremiedbb Could you kindly check if we should consider the codecov errors that are being presented? They seem to relate to the exception conditions in _check_reg_targets, which I don't think are an important part of this PR. |
Any update on this? It's been a long while since changes were approved, but still didn't merge the PR |
@101AlexMartin I think the PR looks good overall. There is just the codecov error ( the 1 failing check ). I think it would be good to have a second opinion regarding that and if required we can add a few tests. @glemaitre could you kindly have a quick look? |
Hello, any update on this? It's been another month without any update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a mention of the root_mean_squared_error
in the model_selection.rst
file. I think it woule be enough to add a small paragraph in the "Mean Squared Error" section mentioning the function. We can do the same thing for the root mean squared log error.
I checked the codecov issue but I think this is a false positive. |
When I access
|
Oh right, the file is |
Thanks for the revision, I pushed the new changes. Now CI fails but when I click on it I think the errors are not related with my push (?) |
The linting is failing: #26734 (comment). You need to run Also, you still have a conflict in the changelog (most probably you need to accept all entries). The CIs will not start until you don't solve this conflict. |
I have pre-commit installed, but since I did not modify the files failing at ruff then it is skipped. When I run
Regarding the conflict on the changelog, I don't know what do you mean by accepting the entries. Maybe you can elaborate a bit more on that. |
I merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the CIs are passing, we are good to merge this one. Thanks @101AlexMartin
Cool, thanks |
I checked codecov and the newly created function are covered by the tests. So this is still the same false positive. |
Merging. |
…or (scikit-learn#26734) Co-authored-by: Alejandro Martin <alejandro.martingil@tno.nl> Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…or (scikit-learn#26734) Co-authored-by: Alejandro Martin <alejandro.martingil@tno.nl> Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Create ad-hoc function to calculate RMSE
Any other comments?
As an external user of scikit-learn, I found it counter-intuitive to have a flag inside the MSE function to get RMSE. I think it is difficult to find. In fact, I'm not the only one: https://stackoverflow.com/questions/17197492/is-there-a-library-function-for-root-mean-square-error-rmse-in-python
If you also think this PR might be worth it, I will further elaborate it, generating the tests and creating also an ad-hoc function for the root_mean_square_log_error.