-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Refactor feature_selection.f_regression
and introduce feature_selection.r_regression
#17169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I made this change to have the test ran as I previously got: ``` sklearn/feature_selection/tests/test_feature_select.py:None (sklearn/feature_selection/tests/test_feature_select.py) /home/jsquared/.virtualenvs/sk/lib64/python3.8/site-packages/py/_path/local.py:701: in pyimport __import__(modname) ../__init__.py:27: in <module> from ._mutual_info import mutual_info_regression, mutual_info_classif ../_mutual_info.py:9: in <module> from ..neighbors import NearestNeighbors ../../neighbors/__init__.py:17: in <module> from ._nca import NeighborhoodComponentsAnalysis ../../neighbors/_nca.py:22: in <module> from ..decomposition import PCA ../../decomposition/__init__.py:17: in <module> from .dict_learning import dict_learning E ModuleNotFoundError: No module named 'sklearn.decomposition.dict_learning' ``` It seems that there are reason for this special handling to exist according to the comment above. This might need to be reverted.
This reverts commit 63d6fd7.
What do you think, @jnothman? :) |
Hi @jnothman, this is a friendly reminder for this stalled PR follow-up. |
Hi @jjerphan thanks for your pull request. There are linting issues and a conflict that should be solved before review. Also if you think that this is ready for review, do you mind changing the title from [WIP] to [MRG]? Thanks a lot for your patience! |
f_regression
and introduce r_regression
and abs_r_regression
@cmarmo: are you interested in reviewing this PR? 🙂 |
Hi @jjerphan, thanks for your patience! The documentation for the two new functions is not generated: this is because a reference to the function should be added in the doc/modules/classes.rst file. |
Use accurate descriptions for the parameters. Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Revert changes for unrelated tests.
With this PR, from sklearn.datasets import make_regression
from sklearn.feature_selection import SelectKBest, f_regression
import numpy as np
X, y = make_regression(random_state=0)
f_abs_regression = lambda *args: np.abs(f_regression(*args)[0])
X_new = SelectKBest(f_abs_regression, k=10).fit_transform(X, y) If we want to make X_new = SelectKBest(f_regression, k=10, transform_func=np.abs).fit_transform(X, y) I am wondering if adding 2 new functions worth the maintenance cost for making |
Also check against numpy's implementation of the correlation coefficient. Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Thanks for the comprehensive review, @glemaitre. I should have adressed all your requests. IMO, the value of this PR is the introduction of I would go for @thomasjpfan's first suggestion, hence assuming that it is up to users to define the |
There is also the option of adding a |
Still, it would make sense allowing a way to specify the computation of the p-values as you propose. |
I think that we could drop |
So we can limit the scope for this PR to introduce |
See discussions: scikit-learn#17169 (comment)
f_regression
and introduce r_regression
and abs_r_regression
feature_selection.f_regression
and introduce feature_selection.r_regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small improvement suggestions:
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Merged! Thanks @jjerphan. |
'welcome! Thanks to @DSLituiev for the original PR! |
Reference Issues/PRs
Follow-up of #8353
Fixes #7195
What does this implement/fix? Explain your changes.
Refactors
feature_selection.f_regression
for convenience:feature_selection.f_regression
now relies onfeature_selection.r_regression
, a public function which pre-computes Pearson's ROther remarks
feature_selection.abs_r_regression
, introduced in #8353, which returned the absolute values offeature_selection.r_regression
was initially considered.