FEA Add variable importance to linear models

Describe the workflow you want to enable

I'd like to have a feature importance method native to linear models (without L1 penalty) that is calculated on the training set:

clf = LogisticRegression(with_importance=True)
clf.fit(X, y)
clf.feature_importances_  # or some nice plot thereof

Describe your proposed solution

New proposal

Evaluate if the LMG (Lindeman, Merenda and Gold, see [1, 2]) is applicable and feasible for L2 penalized regression and for GLMs. Else, consider other measures of [1, 2].

In short, LMG is Shapley value decomposition of R2 by the features.

References:

[1] R package relaimpo with JSS paper U. Grömping (2006). Relative Importance for Linear Regression in R: The Package relaimpo
[2] U. Grömping (2016). Variable importance in regression models

Original proposal

Compute the t-statistic of the coefficients

t[j] = coef[j] / std(coef[j])

and use the absolute, i.e. |t|, as measure of (in-sample) importance. For GLMs like the logistic regression, see section 5.3 in https://arxiv.org/pdf/1509.09169.pdf for a formula of Var[coef].

Describe alternatives you've considered, if relevant

Any general importance measure (permutation importance, SHAP values, ...) also works.

Additional context

Given the great and legitimate need for interpretability, I would favor to have a native importance measure for linear models. Random Forests have their own native feature_importances_ with the warning

impurity-based feature importances can be misleading for high cardinality features (many unique values).

We could add a similar warning for collinear features like

feature importances can be misleading for collinear or high-dimensional features.

I guess, in the end, this is true for all feature importance measures, even for SHAP (see also our multicollinear example).

Prior discussions like #16802, #6773, #13048, focued on p-values which seem out-of-scope for scikit-learn for different reasons. I hope we can circumvent these reasons by focusing on feature importance only and not considering p-values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA Add variable importance to linear models #21170

Describe the workflow you want to enable

Describe your proposed solution

New proposal

Original proposal

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

FEA Add variable importance to linear models #21170

Description

Describe the workflow you want to enable

Describe your proposed solution

New proposal

Original proposal

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions