-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Add array API support for GaussianMixture #30777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Quick benchmark on VM with NVIDIA GeForce RTX 3070 import torch
import numpy as np
from time import perf_counter
from sklearn import set_config
from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
set_config(array_api_dispatch=True)
n_samples, n_features = int(5e4), int(1e3)
n_components = 10
print(f"Generating data with shape {(n_samples, n_features)}...")
X_np, _ = make_blobs(
n_samples=n_samples, n_features=n_features, centers=n_components, random_state=0
)
print(f"Data size: {X_np.nbytes / 1e6:.1f} MB")
gmm = GaussianMixture(
n_components=n_components,
covariance_type="diag",
init_params="random",
random_state=0,
)
X_torch_cpu = torch.asarray(X_np)
print("PyTorch CPU GMM")
%timeit gmm.fit(X_torch_cpu)
print("PyTorch GPU GMM")
X_torch_cuda = torch.asarray(X_np, device="cuda")
# .means_[0, 0].item() is to make sure to block to measure CUDA
# computation faithfully, following guideline from
# https://github.com/scikit-learn/scikit-learn/pull/27961#issuecomment-2506259528
%timeit gmm.fit(X_torch_cuda).means_[0, 0].item()
print("NumPy GMM")
%timeit gmm.fit(X_np) Output:
Note that in this case PyTorch GPU vs numpy is 7x, I have seen other cases where it is more 3-4x (e.g. |
I think the confusing error about From build log the error was:
|
GaussianMixture is ready for a first round of review 🎉 ! |
Working on it with @StefanieSenger.
Link to TODO