scikit-learn
diff --git a/‎doc/modules/permutation_importance.rst
Copy file name to clipboardExpand all lines: doc/modules/permutation_importance.rst
+52-8Lines changed: 52 additions & 8 deletions b/‎doc/modules/permutation_importance.rst
Copy file name to clipboardExpand all lines: doc/modules/permutation_importance.rst
+52-8Lines changed: 52 additions & 8 deletions
diff --git a/‎doc/whats_new/v1.0.rst
Copy file name to clipboardExpand all lines: doc/whats_new/v1.0.rst
+7Lines changed: 7 additions & 0 deletions b/‎doc/whats_new/v1.0.rst
Copy file name to clipboardExpand all lines: doc/whats_new/v1.0.rst
+7Lines changed: 7 additions & 0 deletions
diff --git a/‎sklearn/inspection/_permutation_importance.py
Copy file name to clipboardExpand all lines: sklearn/inspection/_permutation_importance.py
+96-24Lines changed: 96 additions & 24 deletions b/‎sklearn/inspection/_permutation_importance.py
Copy file name to clipboardExpand all lines: sklearn/inspection/_permutation_importance.py
+96-24Lines changed: 96 additions & 24 deletions
diff --git a/‎sklearn/inspection/tests/test_permutation_importance.py
Copy file name to clipboardExpand all lines: sklearn/inspection/tests/test_permutation_importance.py
+48-1Lines changed: 48 additions & 1 deletion b/‎sklearn/inspection/tests/test_permutation_importance.py
Copy file name to clipboardExpand all lines: sklearn/inspection/tests/test_permutation_importance.py
+48-1Lines changed: 48 additions & 1 deletion
@@ -16,6 +16,16 @@ indicative of how much the model depends on the feature. This technique
 benefits from being model agnostic and can be calculated many times with
 different permutations of the feature.
 
+.. warning::
+
+  Features that are deemed of **low importance for a bad model** (low
+  cross-validation score) could be **very important for a good model**.
+  Therefore it is always important to evaluate the predictive power of a model
+  using a held-out set (or better with cross-validation) prior to computing
+  importances. Permutation importance does not reflect to the intrinsic
+  predictive value of a feature by itself but **how important this feature is
+  for a particular model**.
+
 The :func:`permutation_importance` function calculates the feature importance
 of :term:`estimators` for a given dataset. The ``n_repeats`` parameter sets the
 number of times a feature is randomly shuffled and returns a sample of feature
@@ -64,15 +74,49 @@ highlight which features contribute the most to the generalization power of the
 inspected model. Features that are important on the training set but not on the
 held-out set might cause the model to overfit.
 
-.. warning::
+The permutation feature importance is the decrease in a model score when a single
+feature value is randomly shuffled. The score function to be used for the
+computation of importances can be specified with the `scoring` argument,
+which also accepts multiple scorers. Using multiple scorers is more computationally
+efficient than sequentially calling :func:`permutation_importance` several times
+with a different scorer, as it reuses model predictions.
 
-  Features that are deemed of **low importance for a bad model** (low
-  cross-validation score) could be **very important for a good model**.
-  Therefore it is always important to evaluate the predictive power of a model
-  using a held-out set (or better with cross-validation) prior to computing
-  importances. Permutation importance does not reflect to the intrinsic
-  predictive value of a feature by itself but **how important this feature is
-  for a particular model**.
+An example of using multiple scorers is shown below, employing a list of metrics,
+but more input formats are possible, as documented in :ref:`multimetric_scoring`.
+
+  >>> scoring = ['r2', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error']
+  >>> r_multi = permutation_importance(
+  ...     model, X_val, y_val, n_repeats=30, random_state=0, scoring=scoring)
+  ...
+  >>> for metric in r_multi:
+  ...     print(f"{metric}")
+  ...     r = r_multi[metric]
+  ...     for i in r.importances_mean.argsort()[::-1]:
+  ...         if r.importances_mean[i] - 2 * r.importances_std[i] > 0:
+  ...             print(f"    {diabetes.feature_names[i]:<8}"
+  ...                   f"{r.importances_mean[i]:.3f}"
+  ...                   f" +/- {r.importances_std[i]:.3f}")
+  ...
+  r2
+    s5      0.204 +/- 0.050
+    bmi     0.176 +/- 0.048
+    bp      0.088 +/- 0.033
+    sex     0.056 +/- 0.023
+  neg_mean_absolute_percentage_error
+    s5      0.081 +/- 0.020
+    bmi     0.064 +/- 0.015
+    bp      0.029 +/- 0.010
+  neg_mean_squared_error
+    s5      1013.903 +/- 246.460
+    bmi     872.694 +/- 240.296
+    bp      438.681 +/- 163.025
+    sex     277.382 +/- 115.126
+
+The ranking of the features is approximately the same for different metrics even
+if the scales of the importance values are very different. However, this is not
+guaranteed and different metrics might lead to significantly different feature
+importances, in particular for models trained for imbalanced classification problems,
+for which the choice of the classification metric can be critical.
 
 Outline of the permutation importance algorithm
 -----------------------------------------------
 
@@ -102,6 +102,13 @@ Changelog
   input strings would result in negative indices in the transformed data.
   :pr:`19035` by :user:`Liu Yu <ly648499246>`.
 
+:mod:`sklearn.inspection`
+.........................
+
+- |Fix| Allow multiple scorers input to
+  :func:`~sklearn.inspection.permutation_importance`.
+  :pr:`19411` by :user:`Simona Maggio <simonamaggio>`.
+
 :mod:`sklearn.linear_model`
 ...........................
 
 
@@ -3,6 +3,8 @@
 from joblib import Parallel
 
 from ..metrics import check_scoring
+from ..metrics._scorer import _check_multimetric_scoring, _MultimetricScorer
+from ..model_selection._validation import _aggregate_score_dicts
 from ..utils import Bunch
 from ..utils import check_random_state
 from ..utils import check_array
@@ -28,24 +30,56 @@ def _calculate_permutation_scores(estimator, X, y, sample_weight, col_idx,
     # (memmap). X.copy() on the other hand is always guaranteed to return a
     # writable data-structure whose columns can be shuffled inplace.
     X_permuted = X.copy()
-    scores = np.zeros(n_repeats)
+
+    scores = []
     shuffling_idx = np.arange(X.shape[0])
-    for n_round in range(n_repeats):
+    for _ in range(n_repeats):
         random_state.shuffle(shuffling_idx)
         if hasattr(X_permuted, "iloc"):
             col = X_permuted.iloc[shuffling_idx, col_idx]
             col.index = X_permuted.index
             X_permuted.iloc[:, col_idx] = col
         else:
             X_permuted[:, col_idx] = X_permuted[shuffling_idx, col_idx]
-        feature_score = _weights_scorer(
-            scorer, estimator, X_permuted, y, sample_weight
+        scores.append(
+            _weights_scorer(scorer, estimator, X_permuted, y, sample_weight)
         )
-        scores[n_round] = feature_score
+
+    if isinstance(scores[0], dict):
+        scores = _aggregate_score_dicts(scores)
+    else:
+        scores = np.array(scores)
 
     return scores
 
 
+def _create_importances_bunch(baseline_score, permuted_score):
+    """Compute the importances as the decrease in score.
+
+    Parameters
+    ----------
+    baseline_score : ndarray of shape (n_features,)
+        The baseline score without permutation.
+    permuted_score : ndarray of shape (n_features, n_repeats)
+        The permuted scores for the `n` repetitions.
+
+    Returns
+    -------
+    importances : :class:`~sklearn.utils.Bunch`
+        Dictionary-like object, with the following attributes.
+        importances_mean : ndarray, shape (n_features, )
+            Mean of feature importance over `n_repeats`.
+        importances_std : ndarray, shape (n_features, )
+            Standard deviation over `n_repeats`.
+        importances : ndarray, shape (n_features, n_repeats)
+            Raw permutation importance scores.
+    """
+    importances = baseline_score - permuted_score
+    return Bunch(importances_mean=np.mean(importances, axis=1),
+                 importances_std=np.std(importances, axis=1),
+                 importances=importances)
+
+
 @_deprecate_positional_args
 def permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5,
                            n_jobs=None, random_state=None, sample_weight=None):
@@ -74,10 +108,25 @@ def permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5,
     y : array-like or None, shape (n_samples, ) or (n_samples, n_classes)
         Targets for supervised or `None` for unsupervised.
 
-    scoring : string, callable or None, default=None
-        Scorer to use. It can be a single
-        string (see :ref:`scoring_parameter`) or a callable (see
-        :ref:`scoring`). If None, the estimator's default scorer is used.
+    scoring : str, callable, list, tuple, or dict, default=None
+        Scorer to use.
+        If `scoring` represents a single score, one can use:
+
+        - a single string (see :ref:`scoring_parameter`);
+        - a callable (see :ref:`scoring`) that returns a single value.
+
+        If `scoring` reprents multiple scores, one can use:
+
+        - a list or tuple of unique strings;
+        - a callable returning a dictionary where the keys are the metric
+          names and the values are the metric scores;
+        - a dictionary with metric names as keys and callables a values.
+
+        Passing multiple scores to `scoring` is more efficient than calling
+        `permutation_importance` for each of the scores as it reuses
+        predictions to avoid redundant computation.
+
+        If None, the estimator's default scorer is used.
 
     n_repeats : int, default=5
         Number of times to permute a feature.
@@ -102,16 +151,20 @@ def permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5,
 
     Returns
     -------
-    result : :class:`~sklearn.utils.Bunch`
+    result : :class:`~sklearn.utils.Bunch` or dict of such instances
         Dictionary-like object, with the following attributes.
 
-        importances_mean : ndarray, shape (n_features, )
+        importances_mean : ndarray of shape (n_features, )
             Mean of feature importance over `n_repeats`.
-        importances_std : ndarray, shape (n_features, )
+        importances_std : ndarray of shape (n_features, )
             Standard deviation over `n_repeats`.
-        importances : ndarray, shape (n_features, n_repeats)
+        importances : ndarray of shape (n_features, n_repeats)
             Raw permutation importance scores.
 
+        If there are multiple scoring metrics in the scoring parameter
+        `result` is a dict with scorer names as keys (e.g. 'roc_auc') and
+        `Bunch` objects like above as values.
+
     References
     ----------
     .. [BRE] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
@@ -143,14 +196,33 @@ def permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5,
     random_state = check_random_state(random_state)
     random_seed = random_state.randint(np.iinfo(np.int32).max + 1)
 
-    scorer = check_scoring(estimator, scoring=scoring)
-    baseline_score = _weights_scorer(scorer, estimator, X, y, sample_weight)
-
-    scores = Parallel(n_jobs=n_jobs)(delayed(_calculate_permutation_scores)(
-        estimator, X, y, sample_weight, col_idx, random_seed, n_repeats, scorer
-    ) for col_idx in range(X.shape[1]))
-
-    importances = baseline_score - np.array(scores)
-    return Bunch(importances_mean=np.mean(importances, axis=1),
-                 importances_std=np.std(importances, axis=1),
-                 importances=importances)
+    if callable(scoring):
+        scorer = scoring
+    elif scoring is None or isinstance(scoring, str):
+        scorer = check_scoring(estimator, scoring=scoring)
+    else:
+        scorers_dict = _check_multimetric_scoring(estimator, scoring)
+        scorer = _MultimetricScorer(**scorers_dict)
+
+    baseline_score = _weights_scorer(scorer, estimator, X, y,
+                                     sample_weight)
+
+    scores = Parallel(n_jobs=n_jobs)(
+        delayed(_calculate_permutation_scores)(
+            estimator, X, y, sample_weight, col_idx, random_seed,
+            n_repeats, scorer
+        ) for col_idx in range(X.shape[1]))
+
+    if isinstance(baseline_score, dict):
+        return {
+            name: _create_importances_bunch(
+                baseline_score[name],
+                # unpack the permuted scores
+                np.array([
+                    scores[col_idx][name] for col_idx in range(X.shape[1])
+                ])
+            )
+            for name in baseline_score
+        }
+    else:
+        return _create_importances_bunch(baseline_score, np.array(scores))
@@ -16,6 +16,11 @@
 from sklearn.impute import SimpleImputer
 from sklearn.inspection import permutation_importance
 from sklearn.model_selection import train_test_split
+from sklearn.metrics import (
+    get_scorer,
+    mean_squared_error,
+    r2_score,
+)
 from sklearn.pipeline import make_pipeline
 from sklearn.preprocessing import KBinsDiscretizer
 from sklearn.preprocessing import OneHotEncoder
@@ -25,7 +30,6 @@
 from sklearn.utils._testing import _convert_container
 
 
-
 @pytest.mark.parametrize("n_jobs", [1, 2])
 def test_permutation_importance_correlated_feature_regression(n_jobs):
     # Make sure that feature highly correlated to the target have a higher
@@ -435,3 +439,46 @@ def my_scorer(estimator, X, y):
                                scoring=my_scorer,
                                n_repeats=1,
                                sample_weight=w)
+
+
+@pytest.mark.parametrize(
+    "list_single_scorer, multi_scorer",
+    [
+        (["r2", "neg_mean_squared_error"], ["r2", "neg_mean_squared_error"]),
+        (
+            ["r2", "neg_mean_squared_error"],
+            {
+                "r2": get_scorer("r2"),
+                "neg_mean_squared_error": get_scorer("neg_mean_squared_error"),
+            },
+        ),
+        (
+            ["r2", "neg_mean_squared_error"],
+            lambda estimator, X, y: {
+                "r2": r2_score(y, estimator.predict(X)),
+                "neg_mean_squared_error": -mean_squared_error(
+                    y, estimator.predict(X)
+                ),
+            },
+        ),
+    ],
+)
+def test_permutation_importance_multi_metric(list_single_scorer, multi_scorer):
+    # Test permutation importance when scoring contains multiple scorers
+
+    # Creating some data and estimator for the permutation test
+    x, y = make_regression(n_samples=500, n_features=10, random_state=0)
+    lr = LinearRegression().fit(x, y)
+
+    multi_importance = permutation_importance(
+        lr, x, y, random_state=1, scoring=multi_scorer, n_repeats=2
+    )
+    assert set(multi_importance.keys()) == set(list_single_scorer)
+
+    for scorer in list_single_scorer:
+        multi_result = multi_importance[scorer]
+        single_result = permutation_importance(
+            lr, x, y, random_state=1, scoring=scorer, n_repeats=2
+        )
+
+        assert_allclose(multi_result.importances, single_result.importances)