Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

TST replace Boston in test_gradient_boosting.py #16937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 23, 2020

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Apr 16, 2020

Reference Issues/PRs

Towards #16155

What does this implement/fix? Explain your changes.

Remove Boston dataset in sklearn/ensemble/tests/test_gradient_boosting.py. Use a subset of California housing dataset for test_boston and use diabetes dataset for all remaining tests.

Any other comments?

Used California dataset for test_boston due to the high mse error if diabetes dataset used (for parameters in test, mse ranged from ~600-1700). Correspondingly there was also a bigger difference between predictions (assert_array_almost_equal(last_y_pred, y_pred)) and predictions were only equal with decimals=-2.
Happy to change to diabetes/another dataset if California not suitable.

@ogrisel
Copy link
Member

ogrisel commented Apr 21, 2020

I wonder if sample_weight=2 * ones is expected to yield the same predictions as when trained with sample_weight=ones for GBRT with the LS loss. The effective learning rate is probably not the same so maybe this assert_array_almost_equal(last_y_pred, y_pred) is just wrong.

@lucyleeow
Copy link
Member Author

ping @glemaitre 😅

@glemaitre
Copy link
Member

@ogrisel I was looking a the LS loss in GBDT, it seems that we divide by the sum of the weight meaning that all 1s or all 2s will lead to the same loss.

@glemaitre glemaitre self-assigned this May 28, 2020
sklearn/ensemble/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member

Ah actually the issue is with the Huber and LAD losses.

@glemaitre
Copy link
Member

By digging in, the initial raw predictions given by the DummyRegressor will be different with 1s and 2s sample weights. Basically, we use the weighted median and it seems that the median was not the same. This looks weird, I need to go more in depth.

FYI LAD and Huber divide by the sum of the sample weights as well and it should work as expected.
We did not have the problem with LS because we predict the mean.

@glemaitre
Copy link
Member

I made a small push by merging master

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as it is.


if last_y_pred is not None:
assert_array_almost_equal(last_y_pred, y_pred)
assert_array_almost_equal(last_y_pred, y_pred, decimal=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before to merge we need to add a FIXME to explain the issue and link to the relevant PR/issue just to not lose track

Suggested change
assert_array_almost_equal(last_y_pred, y_pred, decimal=0)
# FIXME: `decimal=0` is very permissive. This is due to the fact that
# GBRT with and without `sample_weight` do not use the same implementation
# of the median during the initialization with the `DummyRegressor`.
# In the future, we should make sure that both implementations should be the same.
# Refer to https://github.com/scikit-learn/scikit-learn/pull/17377 for some detailed
# explanations.
assert_array_almost_equal(last_y_pred, y_pred, decimal=0)

@lucyleeow
Copy link
Member Author

  • Amended to use make_regression instead of california as we would need to allow network (as California dataset is not local)
  • Amended to use assert_allclose instead of assert_array_almost_equal as the doc says:
     .. note:: It is recommended to use one of `assert_allclose`,
              `assert_array_almost_equal_nulp` or `assert_array_max_ulp`
              instead of this function for more consistent floating point
              comparisons.

It also allows setting of rtol or atol which is more informative than decimal. For check_regression_dataset, this makes the difference in y_pred between the different sample weights very obvious.

@lucyleeow
Copy link
Member Author

lucyleeow commented Jun 17, 2020

For check_regression_dataset, the difference between y_pred for the various sample weights seems to be larger for 32bit. Is this related to #17377 ?

@ogrisel
Copy link
Member

ogrisel commented Jun 17, 2020

For check_regression_dataset, the difference between y_pred for the various sample weights seems to be larger for 32bit. Is this related to #17377 ?

This is weird. I am not sure why the bitness of Python would influence the results of this test. But maybe we should focus on fixing #17377 first and then see what is the scale of discrepancy that remains when fitting those models with sample_weight on 32 vs 64 bit Python.

@lucyleeow
Copy link
Member Author

lucyleeow commented Jun 17, 2020

The 4 failing tests with rtol=65:

  • Linux py36_ubuntu_atlas
  • Windows py36_pip_openblas_32bit
  • Windows py37_conda_mkl
  • scikit-learn.scikit-learn

AFAICT the 4 tests are failing at test_regression_dataset -> assert_allclose(last_y_pred, y_pred, rtol=65) due to 1 mismatch in the 500 predictions ('mismatch 0.20000000000000284%', which = 1/500).

  • Both 64 and 32bit builds are in the 4 failing and the passing builds.
  • The 4 failing tests include both python 3.6 and 3.7
  • 4 failing tests include both 'openblas' and 'mlk'

Not sure why the difference in results.

Trying rtol=100: 3 failing tests with this setting...

@glemaitre
Copy link
Member

If it is only on 32 bits, I would not be surprised that it is linked to the underlying trees: #8853

@glemaitre
Copy link
Member

Sorry I misread. This is happening on both architectures, discard my last comment

# and least absolute deviation.
ones = np.ones(len(boston.target))
ones = np.ones(len(y_reg))
last_y_pred = None
for sample_weight in None, ones, 2 * ones:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for sample_weight in None, ones, 2 * ones:
for sample_weight in [None, ones, 2 * ones]:

@glemaitre
Copy link
Member

Is it always failing for the huber loss?

X_reg, y_reg = make_regression(
n_samples=500, n_features=10, n_informative=8, noise=10, random_state=7
)
y_reg = StandardScaler().fit_transform(y_reg.reshape((-1, 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the function from sklearn.preprocessing import scale

@lucyleeow
Copy link
Member Author

Is it always failing for the huber loss?

Yes, the failure is always with huber and subsample=0.5

last_y_pred = None
for sample_weight in None, ones, 2 * ones:
for sample_weight in [None, ones, 2 * ones]:
clf = GradientBoostingRegressor(n_estimators=100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rename the clf by reg

sample_weight=sample_weight)
leaves = clf.apply(boston.data)
assert leaves.shape == (506, 100)
assert_raises(ValueError, clf.predict, X_reg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this check since it is covered by the common test

Comment on lines 237 to 242
# FIXME: `rtol=65` is very permissive. This is due to the fact that
# GBRT with and without `sample_weight` do not use the same
# implementation of the median during the initialization with the
# `DummyRegressor`. In the future, we should make sure that both
# implementations should be the same. See PR #17377 for more.
assert_allclose(last_y_pred, y_pred, rtol=100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# FIXME: `rtol=65` is very permissive. This is due to the fact that
# GBRT with and without `sample_weight` do not use the same
# implementation of the median during the initialization with the
# `DummyRegressor`. In the future, we should make sure that both
# implementations should be the same. See PR #17377 for more.
assert_allclose(last_y_pred, y_pred, rtol=100)
# FIXME: We temporarily bypass this test. This is due to the fact that
# GBRT with and without `sample_weight` do not use the same
# implementation of the median during the initialization with the
# `DummyRegressor`. In the future, we should make sure that both
# implementations should be the same. See PR #17377 for more.
# assert_allclose(last_y_pred, y_pred)
pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bypass the test. I think that we cannot test this part until we don't fix the percentile computation.
There are no difference between checking with tolerance so large that we don't test anything and not testing it :)

@glemaitre glemaitre changed the title Replace Boston in test_gradient_boosting.py TST replace Boston in test_gradient_boosting.py Jun 23, 2020
@lucyleeow
Copy link
Member Author

Thanks @glemaitre, changes made

@glemaitre glemaitre merged commit 5a33360 into scikit-learn:master Jun 23, 2020
@glemaitre
Copy link
Member

Thanks @lucyleeow

rubywerman pushed a commit to MLH-Fellowship/scikit-learn that referenced this pull request Jun 24, 2020
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@lucyleeow lucyleeow deleted the test_grad_boost branch October 21, 2023 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.