Description
Describe the workflow you want to enable
Recursive Feature Importance has an optional initialisation parameter importance_getter
can can be set to a callable. Accordingly to the documentation:
If
callable
, overrides the default feature importance getter. The callable is passed with the fitted estimator and it should return importance for each feature.
It would be very nice to use SHAP values as a proxy for feature importance. This would enable using recursive feature elimination for models that do not support feature importance out of the box. In order to calculate SHAP feature importances, it could be possible to create an importance_getter that implements it. The current implementation does not allow to do so because, in order to Compute SHAP feature importances, also the training input samples and the target values are needed.
Describe your proposed solution
I propose to pass to the importance_getter callable also the training input samples and the target values. In this way, we could combine RFE and shap values in the following way:
import numpy as np
import shap
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_selection import RFE
def shap_importance_getter(clf, X, y):
explainer = shap.Explainer(clf)
shap_values = explainer.shap_values(X, y)[0]
importance = np.abs(shap_values).mean(0)
return importance
rfe = RFE(
DecisionTreeClassifier(max_depth=3),
n_features_to_select=3,
importance_getter=shap_importance_getter
)
rfe.fit_transform(X, y)
Describe alternatives you've considered, if relevant
There is already a package called probatus that already provides a more ad-hoc implementation of RFE using shap values.
Additional context
No response