KernelDensity incorrect handling of bandwidth

Describe the bug

I was using kernel density estimator
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html
using 'silverman' or 'scott' as the bandwidth argument. Then I found that the bandwidth automatically adjusted by the algorithm is independent of the actual scale of the dataset. In fact, I was shocked to find that the calculation of a bandwidth in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/neighbors/_kde.py for 'silverman' and 'scott' does not check the scales of data at all.

Suppose I fit the model kde to some 2D data X and get the bandwidth as kde.bandwidth_.
Next, I fit the model kde to the same 2D data X but with all elements multiplied by, say, 20 and get the bandwidth as kde.bandwidth_.
I found that these two values of kde.bandwidth_ are equal (it is calculated from the shape of X, see the source code). But obviously they should differ by a factor of 20 if the bandwidth is really computed in a truly adaptive manner.

For your reference, I want to mention that scipy's KDE https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html calculates the covariance of data to extract the scale of data. I think this is the right thing to do.

Note that if the bandwidth is incorrect, everything else is incorrect too, including probablities of samples, etc.

Steps/Code to Reproduce

import numpy as np
from sklearn.neighbors import KernelDensity
X = np.random.randn(1000, 2)
kde = KernelDensity(bandwidth='scott')
kde.fit(X)
print(kde.bandwidth_)

kde.fit(X * 20)
print(kde.bandwidth_)

Expected Results

Different bandwidths for data sets with different scales.

Actual Results

0.31622776601683794
0.31622776601683794

Versions

1.2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

KernelDensity incorrect handling of bandwidth #25623

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

KernelDensity incorrect handling of bandwidth #25623

Description

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions