Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

KernelDensity incorrect handling of bandwidth #25623

Copy link
Copy link
Open
@TaTKSM

Description

@TaTKSM
Issue body actions

Describe the bug

I was using kernel density estimator
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html
using 'silverman' or 'scott' as the bandwidth argument. Then I found that the bandwidth automatically adjusted by the algorithm is independent of the actual scale of the dataset. In fact, I was shocked to find that the calculation of a bandwidth in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/neighbors/_kde.py for 'silverman' and 'scott' does not check the scales of data at all.

Suppose I fit the model kde to some 2D data X and get the bandwidth as kde.bandwidth_.
Next, I fit the model kde to the same 2D data X but with all elements multiplied by, say, 20 and get the bandwidth as kde.bandwidth_.
I found that these two values of kde.bandwidth_ are equal (it is calculated from the shape of X, see the source code). But obviously they should differ by a factor of 20 if the bandwidth is really computed in a truly adaptive manner.

For your reference, I want to mention that scipy's KDE https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html calculates the covariance of data to extract the scale of data. I think this is the right thing to do.

Note that if the bandwidth is incorrect, everything else is incorrect too, including probablities of samples, etc.

Steps/Code to Reproduce

import numpy as np
from sklearn.neighbors import KernelDensity
X = np.random.randn(1000, 2)
kde = KernelDensity(bandwidth='scott')
kde.fit(X)
print(kde.bandwidth_)

kde.fit(X * 20)
print(kde.bandwidth_)

Expected Results

Different bandwidths for data sets with different scales.

Actual Results

0.31622776601683794
0.31622776601683794

Versions

1.2.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.