Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

kernel pca doc additions #19201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added BIN +170 KB doc/images/pca_vs_manifold.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
70 changes: 67 additions & 3 deletions 70 doc/modules/decomposition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,9 +177,71 @@ Note: the implementation of ``inverse_transform`` in :class:`PCA` with
Kernel PCA
----------

:class:`KernelPCA` is an extension of PCA which achieves non-linear
dimensionality reduction through the use of kernels (see :ref:`metrics`). It
has many applications including denoising, compression and structured
:class:`KernelPCA` is an extension of PCA which achieves non-linear dimensionality
reduction by first projecting the data :math:`X` ``(n_samples,n_features)`` into a higher
dimensional space using :math:`\phi(X)` ``(n_samples,m_features)`` before finding the projection axes.

Although this might sound computationally expensive (especially when ``m_features >> n_features``),
it is possible to make this technique feasible by using kernel functions (see :ref:`metrics`) and the so-called kernel
trick which allow us to model those higher dimensional space without having to compute the coordinates
of each points.

It is interesting to use :class:`KernelPCA` when dealing with non-linear datasets.
In fact linear subspaces might not adequately represent the underlying manifold if the dataset has some
non linear structure.

Here in this example, we have in the upper-left a 2-d surface embedded in a 3-d space.


.. image:: images/pca_vs_manifold.png
:align: center
:scale: 75%

Applying linear dimensionality reduction (i.e projecting all data points onto a hyperplane) induces a high
loss of information (top right figure).
In contrast, non-linear approaches are capable of modeling and unfolding the manifold to represent the data adequately.

Kernel approaches rely on the intuition that many datasets which are not linearly separable in their original
attributes can be made so by projecting them into a higher dimensional space.
It is famously used with Support Vector Machines.

Recall that in PCA, we define the projection matrix as the eigenvectors of the centered covariance
matrix associated with the highest eigenvalues. Equivalently, SVD (see :class:`TruncatedSVD`) can be used
where the projection matrix becomes the left singular vectors of :math:`U` associated with the highest
singular values of :math:`\Sigma`. The latter approach proves to be more efficient when dealing with large datasets.

Here, instead of using :math:`X` ``(n_features,n_samples)``, we use :math:`\phi(X)` which is an embedding
of the data into a higher dimensional space (``(m_features,n_samples)`` where ``m_features >> n_features)``). We want to avoid computing
the latter as it becomes infeasible for large values of ``m`` : This is where the kernel trick is used.
We define and use a kernel function :math:`k(.,.)` (see :ref:`metrics`) which enables us to operate in
higher-dimensional space without ever computing the coordinates :math:`\phi(X)` of the data.
Instead, we simply compute the inner products :math:`\phi(X)^T \phi(X)` between all pairs of data points projected in the feature space.
This means that we can implicitly model any non linear transformation :math:`\phi(X)` as long as we can
calculate the dot product between the data points.

Therefore, instead of computing the covariance matrix :math:`\phi(X)\phi(X)^T` ``(m_features,m_features``, we use the kernel matrix
:math:`K` ``(n_samples,n_samples)`` associated with the kernel function :math:`k(.,.)` which contains all pairwise evaluations
of the dot product. Just like for PCA, the matrix :math:`K` has to be centered before solving the eigenproblem :

.. math::

K \alpha_{i} = \lambda_{i} \alpha_{i}

whose eigenvectors :math:`\alpha_{i}` ``(n_samples,)`` give us
our new projection axes. All that's left is to pick a number of components ``n_components`` just like in PCA.
Each data point :math:`x` can then be projected by computing its coordinates for each new dimension :math:`j` :

.. math::

y_{j} = \sum\limits_{i=1}^{n_{samples}} \alpha_{ij} k(x_{i},x) \qquad \text{for }j = 1 , \dots, n_{components}

.. note::

Because we’re using the kernel matrix :math:`K` ``(n_samples,n_samples)`` instead of the covariance matrix :math:`C` ``(n_features,n_features)``,
KPCA scales with the number of data points ``n_samples`` whereas
PCA does so with the number of input features ``n_features``.

It has many applications including denoising, compression and structured
prediction (kernel dependency estimation). :class:`KernelPCA` supports both
``transform`` and ``inverse_transform``.

Expand All @@ -188,6 +250,8 @@ prediction (kernel dependency estimation). :class:`KernelPCA` supports both
:align: center
:scale: 75%



.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_decomposition_plot_kernel_pca.py`
Expand Down
Morty Proxy This is a proxified and sanitized view of the page, visit original site.