-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST replace Boston in test_gradient_boosting.py #16937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wonder if |
ping @glemaitre 😅 |
@ogrisel I was looking a the LS loss in GBDT, it seems that we divide by the sum of the weight meaning that all 1s or all 2s will lead to the same loss. |
Ah actually the issue is with the Huber and LAD losses. |
By digging in, the initial raw predictions given by the FYI LAD and Huber divide by the sum of the sample weights as well and it should work as expected. |
I made a small push by merging master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as it is.
|
||
if last_y_pred is not None: | ||
assert_array_almost_equal(last_y_pred, y_pred) | ||
assert_array_almost_equal(last_y_pred, y_pred, decimal=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before to merge we need to add a FIXME to explain the issue and link to the relevant PR/issue just to not lose track
assert_array_almost_equal(last_y_pred, y_pred, decimal=0) | |
# FIXME: `decimal=0` is very permissive. This is due to the fact that | |
# GBRT with and without `sample_weight` do not use the same implementation | |
# of the median during the initialization with the `DummyRegressor`. | |
# In the future, we should make sure that both implementations should be the same. | |
# Refer to https://github.com/scikit-learn/scikit-learn/pull/17377 for some detailed | |
# explanations. | |
assert_array_almost_equal(last_y_pred, y_pred, decimal=0) |
It also allows setting of |
For |
This is weird. I am not sure why the bitness of Python would influence the results of this test. But maybe we should focus on fixing #17377 first and then see what is the scale of discrepancy that remains when fitting those models with |
The 4 failing tests with
AFAICT the 4 tests are failing at
Not sure why the difference in results. Trying |
If it is only on 32 bits, I would not be surprised that it is linked to the underlying trees: #8853 |
Sorry I misread. This is happening on both architectures, discard my last comment |
# and least absolute deviation. | ||
ones = np.ones(len(boston.target)) | ||
ones = np.ones(len(y_reg)) | ||
last_y_pred = None | ||
for sample_weight in None, ones, 2 * ones: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sample_weight in None, ones, 2 * ones: | |
for sample_weight in [None, ones, 2 * ones]: |
Is it always failing for the |
X_reg, y_reg = make_regression( | ||
n_samples=500, n_features=10, n_informative=8, noise=10, random_state=7 | ||
) | ||
y_reg = StandardScaler().fit_transform(y_reg.reshape((-1, 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use the function from sklearn.preprocessing import scale
Yes, the failure is always with |
last_y_pred = None | ||
for sample_weight in None, ones, 2 * ones: | ||
for sample_weight in [None, ones, 2 * ones]: | ||
clf = GradientBoostingRegressor(n_estimators=100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you rename the clf
by reg
sample_weight=sample_weight) | ||
leaves = clf.apply(boston.data) | ||
assert leaves.shape == (506, 100) | ||
assert_raises(ValueError, clf.predict, X_reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this check since it is covered by the common test
# FIXME: `rtol=65` is very permissive. This is due to the fact that | ||
# GBRT with and without `sample_weight` do not use the same | ||
# implementation of the median during the initialization with the | ||
# `DummyRegressor`. In the future, we should make sure that both | ||
# implementations should be the same. See PR #17377 for more. | ||
assert_allclose(last_y_pred, y_pred, rtol=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# FIXME: `rtol=65` is very permissive. This is due to the fact that | |
# GBRT with and without `sample_weight` do not use the same | |
# implementation of the median during the initialization with the | |
# `DummyRegressor`. In the future, we should make sure that both | |
# implementations should be the same. See PR #17377 for more. | |
assert_allclose(last_y_pred, y_pred, rtol=100) | |
# FIXME: We temporarily bypass this test. This is due to the fact that | |
# GBRT with and without `sample_weight` do not use the same | |
# implementation of the median during the initialization with the | |
# `DummyRegressor`. In the future, we should make sure that both | |
# implementations should be the same. See PR #17377 for more. | |
# assert_allclose(last_y_pred, y_pred) | |
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's bypass the test. I think that we cannot test this part until we don't fix the percentile computation.
There are no difference between checking with tolerance so large that we don't test anything and not testing it :)
Thanks @glemaitre, changes made |
Thanks @lucyleeow |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Towards #16155
What does this implement/fix? Explain your changes.
Remove Boston dataset in
sklearn/ensemble/tests/test_gradient_boosting.py
. Use a subset of California housing dataset fortest_boston
and use diabetes dataset for all remaining tests.Any other comments?
Used California dataset for
test_boston
due to the high mse error if diabetes dataset used (for parameters in test, mse ranged from ~600-1700). Correspondingly there was also a bigger difference between predictions (assert_array_almost_equal(last_y_pred, y_pred)
) and predictions were only equal withdecimals=-2
.Happy to change to diabetes/another dataset if California not suitable.