Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 346ddd9

Browse filesBrowse files
ageronjeremiedbbogrisel
authored andcommitted
ENH Add inverse_transform to random projection transformers (scikit-learn#21701)
Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
1 parent 86d9cbc commit 346ddd9
Copy full SHA for 346ddd9

File tree

Expand file treeCollapse file tree

4 files changed

+187
-8
lines changed
Filter options
Expand file treeCollapse file tree

4 files changed

+187
-8
lines changed

‎doc/modules/random_projection.rst

Copy file name to clipboardExpand all lines: doc/modules/random_projection.rst
+39Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,3 +160,42 @@ projection transformer::
160160
In Proceedings of the 12th ACM SIGKDD international conference on
161161
Knowledge discovery and data mining (KDD '06). ACM, New York, NY, USA,
162162
287-296.
163+
164+
165+
.. _random_projection_inverse_transform:
166+
167+
Inverse Transform
168+
=================
169+
The random projection transformers have ``compute_inverse_components`` parameter. When
170+
set to True, after creating the random ``components_`` matrix during fitting,
171+
the transformer computes the pseudo-inverse of this matrix and stores it as
172+
``inverse_components_``. The ``inverse_components_`` matrix has shape
173+
:math:`n_{features} \times n_{components}`, and it is always a dense matrix,
174+
regardless of whether the components matrix is sparse or dense. So depending on
175+
the number of features and components, it may use a lot of memory.
176+
177+
When the ``inverse_transform`` method is called, it computes the product of the
178+
input ``X`` and the transpose of the inverse components. If the inverse components have
179+
been computed during fit, they are reused at each call to ``inverse_transform``.
180+
Otherwise they are recomputed each time, which can be costly. The result is always
181+
dense, even if ``X`` is sparse.
182+
183+
Here a small code example which illustrates how to use the inverse transform
184+
feature::
185+
186+
>>> import numpy as np
187+
>>> from sklearn.random_projection import SparseRandomProjection
188+
>>> X = np.random.rand(100, 10000)
189+
>>> transformer = SparseRandomProjection(
190+
... compute_inverse_components=True
191+
... )
192+
...
193+
>>> X_new = transformer.fit_transform(X)
194+
>>> X_new.shape
195+
(100, 3947)
196+
>>> X_new_inversed = transformer.inverse_transform(X_new)
197+
>>> X_new_inversed.shape
198+
(100, 10000)
199+
>>> X_new_again = transformer.transform(X_new_inversed)
200+
>>> np.allclose(X_new, X_new_again)
201+
True

‎doc/whats_new/v1.1.rst

Copy file name to clipboardExpand all lines: doc/whats_new/v1.1.rst
+8Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -800,6 +800,14 @@ Changelog
800800
:class:`random_projection.GaussianRandomProjection` preserves dtype for
801801
`numpy.float32`. :pr:`22114` by :user:`Takeshi Oura <takoika>`.
802802

803+
- |Enhancement| Adds an :meth:`inverse_transform` method and a
804+
`compute_inverse_transform` parameter to all transformers in the
805+
:mod:`~sklearn.random_projection` module:
806+
:class:`~sklearn.random_projection.GaussianRandomProjection` and
807+
:class:`~sklearn.random_projection.SparseRandomProjection`. When the parameter is set
808+
to True, the pseudo-inverse of the components is computed during `fit` and stored as
809+
`inverse_components_`. :pr:`21701` by `Aurélien Geron <ageron>`.
810+
803811
- |API| Adds :term:`get_feature_names_out` to all transformers in the
804812
:mod:`~sklearn.random_projection` module:
805813
:class:`~sklearn.random_projection.GaussianRandomProjection` and

‎sklearn/random_projection.py

Copy file name to clipboardExpand all lines: sklearn/random_projection.py
+86-6Lines changed: 86 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
from abc import ABCMeta, abstractmethod
3232

3333
import numpy as np
34+
from scipy import linalg
3435
import scipy.sparse as sp
3536

3637
from .base import BaseEstimator, TransformerMixin
@@ -39,10 +40,9 @@
3940
from .utils import check_random_state
4041
from .utils.extmath import safe_sparse_dot
4142
from .utils.random import sample_without_replacement
42-
from .utils.validation import check_is_fitted
43+
from .utils.validation import check_array, check_is_fitted
4344
from .exceptions import DataDimensionalityWarning
4445

45-
4646
__all__ = [
4747
"SparseRandomProjection",
4848
"GaussianRandomProjection",
@@ -302,11 +302,18 @@ class BaseRandomProjection(
302302

303303
@abstractmethod
304304
def __init__(
305-
self, n_components="auto", *, eps=0.1, dense_output=False, random_state=None
305+
self,
306+
n_components="auto",
307+
*,
308+
eps=0.1,
309+
dense_output=False,
310+
compute_inverse_components=False,
311+
random_state=None,
306312
):
307313
self.n_components = n_components
308314
self.eps = eps
309315
self.dense_output = dense_output
316+
self.compute_inverse_components = compute_inverse_components
310317
self.random_state = random_state
311318

312319
@abstractmethod
@@ -323,12 +330,18 @@ def _make_random_matrix(self, n_components, n_features):
323330
324331
Returns
325332
-------
326-
components : {ndarray, sparse matrix} of shape \
327-
(n_components, n_features)
333+
components : {ndarray, sparse matrix} of shape (n_components, n_features)
328334
The generated random matrix. Sparse matrix will be of CSR format.
329335
330336
"""
331337

338+
def _compute_inverse_components(self):
339+
"""Compute the pseudo-inverse of the (densified) components."""
340+
components = self.components_
341+
if sp.issparse(components):
342+
components = components.toarray()
343+
return linalg.pinv(components, check_finite=False)
344+
332345
def fit(self, X, y=None):
333346
"""Generate a sparse random projection matrix.
334347
@@ -399,6 +412,9 @@ def fit(self, X, y=None):
399412
" not the proper shape."
400413
)
401414

415+
if self.compute_inverse_components:
416+
self.inverse_components_ = self._compute_inverse_components()
417+
402418
return self
403419

404420
def transform(self, X):
@@ -437,6 +453,35 @@ def _n_features_out(self):
437453
"""
438454
return self.n_components
439455

456+
def inverse_transform(self, X):
457+
"""Project data back to its original space.
458+
459+
Returns an array X_original whose transform would be X. Note that even
460+
if X is sparse, X_original is dense: this may use a lot of RAM.
461+
462+
If `compute_inverse_components` is False, the inverse of the components is
463+
computed during each call to `inverse_transform` which can be costly.
464+
465+
Parameters
466+
----------
467+
X : {array-like, sparse matrix} of shape (n_samples, n_components)
468+
Data to be transformed back.
469+
470+
Returns
471+
-------
472+
X_original : ndarray of shape (n_samples, n_features)
473+
Reconstructed data.
474+
"""
475+
check_is_fitted(self)
476+
477+
X = check_array(X, dtype=[np.float64, np.float32], accept_sparse=("csr", "csc"))
478+
479+
if self.compute_inverse_components:
480+
return X @ self.inverse_components_.T
481+
482+
inverse_components = self._compute_inverse_components()
483+
return X @ inverse_components.T
484+
440485
def _more_tags(self):
441486
return {
442487
"preserves_dtype": [np.float64, np.float32],
@@ -474,6 +519,11 @@ class GaussianRandomProjection(BaseRandomProjection):
474519
Smaller values lead to better embedding and higher number of
475520
dimensions (n_components) in the target projection space.
476521
522+
compute_inverse_components : bool, default=False
523+
Learn the inverse transform by computing the pseudo-inverse of the
524+
components during fit. Note that computing the pseudo-inverse does not
525+
scale well to large matrices.
526+
477527
random_state : int, RandomState instance or None, default=None
478528
Controls the pseudo random number generator used to generate the
479529
projection matrix at fit time.
@@ -488,6 +538,12 @@ class GaussianRandomProjection(BaseRandomProjection):
488538
components_ : ndarray of shape (n_components, n_features)
489539
Random matrix used for the projection.
490540
541+
inverse_components_ : ndarray of shape (n_features, n_components)
542+
Pseudo-inverse of the components, only computed if
543+
`compute_inverse_components` is True.
544+
545+
.. versionadded:: 1.1
546+
491547
n_features_in_ : int
492548
Number of features seen during :term:`fit`.
493549
@@ -516,11 +572,19 @@ class GaussianRandomProjection(BaseRandomProjection):
516572
(25, 2759)
517573
"""
518574

519-
def __init__(self, n_components="auto", *, eps=0.1, random_state=None):
575+
def __init__(
576+
self,
577+
n_components="auto",
578+
*,
579+
eps=0.1,
580+
compute_inverse_components=False,
581+
random_state=None,
582+
):
520583
super().__init__(
521584
n_components=n_components,
522585
eps=eps,
523586
dense_output=True,
587+
compute_inverse_components=compute_inverse_components,
524588
random_state=random_state,
525589
)
526590

@@ -610,6 +674,14 @@ class SparseRandomProjection(BaseRandomProjection):
610674
If False, the projected data uses a sparse representation if
611675
the input is sparse.
612676
677+
compute_inverse_components : bool, default=False
678+
Learn the inverse transform by computing the pseudo-inverse of the
679+
components during fit. Note that the pseudo-inverse is always a dense
680+
array, even if the training data was sparse. This means that it might be
681+
necessary to call `inverse_transform` on a small batch of samples at a
682+
time to avoid exhausting the available memory on the host. Moreover,
683+
computing the pseudo-inverse does not scale well to large matrices.
684+
613685
random_state : int, RandomState instance or None, default=None
614686
Controls the pseudo random number generator used to generate the
615687
projection matrix at fit time.
@@ -625,6 +697,12 @@ class SparseRandomProjection(BaseRandomProjection):
625697
Random matrix used for the projection. Sparse matrix will be of CSR
626698
format.
627699
700+
inverse_components_ : ndarray of shape (n_features, n_components)
701+
Pseudo-inverse of the components, only computed if
702+
`compute_inverse_components` is True.
703+
704+
.. versionadded:: 1.1
705+
628706
density_ : float in range 0.0 - 1.0
629707
Concrete density computed from when density = "auto".
630708
@@ -676,12 +754,14 @@ def __init__(
676754
density="auto",
677755
eps=0.1,
678756
dense_output=False,
757+
compute_inverse_components=False,
679758
random_state=None,
680759
):
681760
super().__init__(
682761
n_components=n_components,
683762
eps=eps,
684763
dense_output=dense_output,
764+
compute_inverse_components=compute_inverse_components,
685765
random_state=random_state,
686766
)
687767

‎sklearn/tests/test_random_projection.py

Copy file name to clipboardExpand all lines: sklearn/tests/test_random_projection.py
+54-2Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import functools
22
from typing import List, Any
3+
import warnings
34

45
import numpy as np
56
import scipy.sparse as sp
@@ -31,8 +32,8 @@
3132

3233
# Make some random data with uniformly located non zero entries with
3334
# Gaussian distributed values
34-
def make_sparse_random_data(n_samples, n_features, n_nonzeros):
35-
rng = np.random.RandomState(0)
35+
def make_sparse_random_data(n_samples, n_features, n_nonzeros, random_state=0):
36+
rng = np.random.RandomState(random_state)
3637
data_coo = sp.coo_matrix(
3738
(
3839
rng.randn(n_nonzeros),
@@ -377,6 +378,57 @@ def test_random_projection_feature_names_out(random_projection_cls):
377378
assert_array_equal(names_out, expected_names_out)
378379

379380

381+
@pytest.mark.parametrize("n_samples", (2, 9, 10, 11, 1000))
382+
@pytest.mark.parametrize("n_features", (2, 9, 10, 11, 1000))
383+
@pytest.mark.parametrize("random_projection_cls", all_RandomProjection)
384+
@pytest.mark.parametrize("compute_inverse_components", [True, False])
385+
def test_inverse_transform(
386+
n_samples,
387+
n_features,
388+
random_projection_cls,
389+
compute_inverse_components,
390+
global_random_seed,
391+
):
392+
n_components = 10
393+
394+
random_projection = random_projection_cls(
395+
n_components=n_components,
396+
compute_inverse_components=compute_inverse_components,
397+
random_state=global_random_seed,
398+
)
399+
400+
X_dense, X_csr = make_sparse_random_data(
401+
n_samples,
402+
n_features,
403+
n_samples * n_features // 100 + 1,
404+
random_state=global_random_seed,
405+
)
406+
407+
for X in [X_dense, X_csr]:
408+
with warnings.catch_warnings():
409+
warnings.filterwarnings(
410+
"ignore",
411+
message=(
412+
"The number of components is higher than the number of features"
413+
),
414+
category=DataDimensionalityWarning,
415+
)
416+
projected = random_projection.fit_transform(X)
417+
418+
if compute_inverse_components:
419+
assert hasattr(random_projection, "inverse_components_")
420+
inv_components = random_projection.inverse_components_
421+
assert inv_components.shape == (n_features, n_components)
422+
423+
projected_back = random_projection.inverse_transform(projected)
424+
assert projected_back.shape == X.shape
425+
426+
projected_again = random_projection.transform(projected_back)
427+
if hasattr(projected, "toarray"):
428+
projected = projected.toarray()
429+
assert_allclose(projected, projected_again, rtol=1e-7, atol=1e-10)
430+
431+
380432
@pytest.mark.parametrize("random_projection_cls", all_RandomProjection)
381433
@pytest.mark.parametrize(
382434
"input_dtype, expected_dtype",

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.