Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MNT n_features_in_ consistency in decomposition #18557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Oct 7, 2020

Early PR to let @thomasjpfan (and other know) that I started working on this module.

Builds on #18514.

@@ -324,4 +323,5 @@ def test_strict_mode_parametrize_with_checks(estimator, check):
@pytest.mark.parametrize("estimator", N_FEATURES_IN_AFTER_FIT_ESTIMATORS,
ids=_get_check_estimator_ids)
def test_check_n_features_in_after_fitting(estimator):
_set_checking_parameters(estimator)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan if you work on other module you might need this as long as this check is not part of the list of standard checks.

@thomasjpfan thomasjpfan changed the title feature_names_in_ consistency with _validate_data in sklearn.decomposition module n_features_in_ consistency with _validate_data in sklearn.decomposition module Oct 7, 2020
@ogrisel
Copy link
Member Author

ogrisel commented Oct 7, 2020

Thanks for fixing the title of this PR :)

@ogrisel ogrisel force-pushed the features_in_consistency_decomposition branch from 108c7a4 to 2fb837d Compare October 7, 2020 16:39
@@ -1347,6 +1347,9 @@ def transform(self, X):
Transformed data.
"""
check_is_fitted(self)
X = self._validate_data(X, accept_sparse=('csr', 'csc'),
dtype=[np.float64, np.float32],
reset=False)

W, _, n_iter_ = non_negative_factorization(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non_negative_factorization also does a call to check_array internally so calling _validate_data here causes some performance overhead. For simplicity's sake I don't want to do optimize this as part of this PR but we might want to add a kwarg to non_negative_factorization to skip input validation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe write that as a TODO comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is most likely going to appear in other places. Another solution would be to have a private _non_negative_factorization that does not call check_array, but still check for non-negative values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment.

@ogrisel ogrisel marked this pull request as ready for review October 7, 2020 16:46
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ogrisel, a few questions but looks good

@@ -124,7 +123,7 @@ def transform(self, X):
"""
check_is_fitted(self)

X = check_array(X)
X = self._validate_data(X, dtype=[np.float64, np.float32], reset=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to pass the dtype here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why this could would not work properly in float32 so this is a slight performance improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the fit of PCA accepts float32 without upcasting so this change makes transform consistent with fit.

sklearn/decomposition/_dict_learning.py Outdated Show resolved Hide resolved

Returns
-------
X_new : ndarray of shape (n_samples, n_components)
"""
check_is_fitted(self)

X = check_array(X, copy=copy, dtype=FLOAT_DTYPES)
X = self._validate_data(X, copy=(copy and self.whiten),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a change of behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small memory efficiency optim :)

sklearn/decomposition/_incremental_pca.py Outdated Show resolved Hide resolved
@@ -1347,6 +1347,9 @@ def transform(self, X):
Transformed data.
"""
check_is_fitted(self)
X = self._validate_data(X, accept_sparse=('csr', 'csc'),
dtype=[np.float64, np.float32],
reset=False)

W, _, n_iter_ = non_negative_factorization(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe write that as a TODO comment?

sklearn/decomposition/_lda.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_online_lda.py Outdated Show resolved Hide resolved
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ogrisel, LGTM when green!

sklearn/decomposition/_lda.py Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan changed the title n_features_in_ consistency with _validate_data in sklearn.decomposition module MNT n_features_in_ consistency in decomposition Oct 8, 2020
@thomasjpfan thomasjpfan merged commit 548a452 into scikit-learn:master Oct 8, 2020
@ogrisel ogrisel deleted the features_in_consistency_decomposition branch October 8, 2020 14:53
amrcode pushed a commit to amrcode/scikit-learn that referenced this pull request Oct 19, 2020
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.