eliminate performance regression when normalize is False #19606

maikia · Mar 3, 2021

closes #19600

There was a major performance regression in the linear models. This was due to the new use of the _incremental_mean_and_var().
In the case when normalize parameter is not set in the linear model calculations of the variance are not necessary. This PR exchanges it for np.average() in case when normalize is set to False.

Performance (current main):

zoomed into _preprocess_data at _base.py:

Performance (this PR):

zoomed into _preprocess_data at _base.py:

The performance is measured using the code of @jeremiedbb :

from sklearn.linear_model import ElasticNet 
from asv_benchmarks.benchmarks.datasets import _synth_regression_dataset 
data = _synth_regression_dataset(n_samples=5000, n_features=10000) 
X, _, y, _ = data 
estimator = ElasticNet(precompute=False, alpha=100, random_state=0) 
%load_ext snakeviz 
%snakeviz estimator.fit(X, y)

cc @ogrisel @jeremiedbb @agramfort

jeremiedbb

lgtm. Just 1 small thing

sklearn/linear_model/_base.py

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

jeremiedbb · Mar 3, 2021

Thanks @maika !

ogrisel · Mar 4, 2021

Thanks @maikia, the nighlty benchmark report confirms that the perf regression has been fixed:

https://scikit-learn.org/scikit-learn-benchmarks/#linear_model.ElasticNetBenchmark.time_fit?p-representation='dense'&p-precompute=False

exchange _incremental_mean_and_var for np.averate when not normalize

594b0ea

github-actions bot added the module:linear_model label Mar 3, 2021

maikia changed the title ~~exchange _incremental_mean_and_var for np.averate when not normalize~~ eliminate performance regression when normalize is False Mar 3, 2021

agramfort approved these changes Mar 3, 2021

View reviewed changes

jeremiedbb approved these changes Mar 3, 2021

View reviewed changes

sklearn/linear_model/_base.py Outdated Show resolved Hide resolved

Update sklearn/linear_model/_base.py

205dd49

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

jeremiedbb merged commit 1045d16 into scikit-learn:main Mar 3, 2021

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

eliminate performance regression when normalize is False #19606

eliminate performance regression when normalize is False #19606

Uh oh!

maikia commented Mar 3, 2021 •

edited

Loading

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

jeremiedbb commented Mar 3, 2021

Uh oh!

ogrisel commented Mar 4, 2021

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

eliminate performance regression when normalize is False #19606

eliminate performance regression when normalize is False #19606

Uh oh!

Conversation

maikia commented Mar 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremiedbb commented Mar 3, 2021

Uh oh!

ogrisel commented Mar 4, 2021

Uh oh!

Uh oh!

maikia commented Mar 3, 2021 •

edited

Loading