Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MAINT Use check_scalar in BaseDecisionTree #21990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Jan 24, 2022

Conversation

genvalen
Copy link
Contributor

@genvalen genvalen commented Dec 15, 2021

Reference Issues/PRs

Addresses #20724
#DataUmbrella

What does this implement/fix? Explain your changes.

Summary of changes to BaseDecisionTree:

  • Add tests to ensure estimator raises proper errors when invalid arguments are passed in.
  • Use the helper function check_scalar from sklearn.utils to validate the scalar parameters.

Test and validation progress:

  • max_depth
  • min_samples_split
  • min_samples_leaf
  • min_weight_fraction_leaf
  • max_features
  • max_leaf_nodes
  • min_impurity_decrease
  • ccp_alpha

References

  1. check_scalar docs
  2. PR #20723

Any other comments?

@genvalen
Copy link
Contributor Author

genvalen commented Dec 16, 2021

Notes on param range in trees
(param: type, range)
max_depth: int, if not none, then [1, )
min_samples_split: int -> [2, number samples], float-> (0, 1] ---> int updated to [2, inf)
min_samples_leaf: int -> [1, number of samples], float-> (0, 0.5] ---> updated to [1, inf) and (0, inf) respectively
min_weight_fraction_leaf: float-> [0, 0.5]
max_features: int -> [1, number of features], float (0, 1], or string (not checked with check_scalar)
max_leaf_nodes: int, if not none, then [2, )
min_impurity_decrease: float-> [0, )
ccp_alpha: float-> [0, )

Edit: updated ranges based on the ranges checked in the code
Edit2: note the upper bound for min_samples_split and min_samples_leaf was not explicitly specified in the code, but I included it in the new checks.

@genvalen
Copy link
Contributor Author

genvalen commented Dec 20, 2021

Hi @glemaitre, the range for these parameters is different in the existing code and in my notes:
min_samples_leaf: float-> (0, 0.5] vs (0, 1)
min_weight_fraction_leaf: float-> [0, 0.5] vs [0, 1)
max_leaf_nodes: [2, ) vs [1, )

Could you help to clear this up? I am already writing new checks based on the range in the notes, but if it makes more sense to keep the ranges consistent with the existing code, I will update them. Thank you!

@glemaitre
Copy link
Member

It makes sense for max_leaf_nodes because if you have 1 split then you automatically have at least 2 leaves.
For the other one, it is less straightforward but if the code makes these checks so these bounds are correct.

@genvalen genvalen marked this pull request as ready for review December 30, 2021 02:26
@genvalen genvalen changed the title [WIP] MNT Use check_scalar in BaseDecisionTree [MRG] MNT Use check_scalar in BaseDecisionTree Dec 30, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR we do need to update the messages in test_gbdt_parameter_checks since we are changing the error messages:

def test_gbdt_parameter_checks(GradientBoosting, X, y, params, err_msg):

sklearn/tree/_classes.py Outdated Show resolved Hide resolved
@genvalen
Copy link
Contributor Author

For this PR we do need to update the messages in test_gbdt_parameter_checks since we are changing the error messages:

def test_gbdt_parameter_checks(GradientBoosting, X, y, params, err_msg):

Thanks @thomasjpfan, I've added a commit the updates the messages causing the issues.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

sklearn/ensemble/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, otherwise this looks good to go.

sklearn/tree/tests/test_tree.py Outdated Show resolved Hide resolved
@thomasjpfan thomasjpfan changed the title [MRG] MAINT Use check_scalar in BaseDecisionTree MAINT Use check_scalar in BaseDecisionTree Jan 12, 2022
sklearn/tree/_classes.py Outdated Show resolved Hide resolved
sklearn/tree/_classes.py Outdated Show resolved Hide resolved
genvalen and others added 4 commits January 12, 2022 19:01
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well!

sklearn/tree/_classes.py Outdated Show resolved Hide resolved
genvalen and others added 2 commits January 24, 2022 13:37
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ogrisel
Copy link
Member

ogrisel commented Jan 24, 2022

The new release of numpydoc-1.2 is making test_check_docstring_parameters fail but this is unrelated to this PR.

@ogrisel ogrisel merged commit 7879eb1 into scikit-learn:main Jan 24, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 2, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 7, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 13, 2022
sebp added a commit to sebp/scikit-survival that referenced this pull request Aug 13, 2022
@genvalen genvalen deleted the BaseDecisionTree_add_check_scalar branch March 10, 2023 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.