Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Feature Selectors fail to route metadata when inside a Pipeline #30527

Copy link
Copy link
Open
@kschluns

Description

@kschluns
Issue body actions

Describe the bug

According to the metadata routing docs, Feature Selectors only have four classes that support metadata routing (as of v1.6):

Each of these classes fail to route metadata when used inside a Pipeline object. When sample_weight is provided in the Pipeline's **fit_params, the failure to pass sample_weight to the feature selector's estimator may result in incorrect feature selection (e.g., when the relationship between the features and the response are materially impacted by sample_weight).

Steps/Code to Reproduce

import numpy as np
import sklearn
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline

sklearn.set_config(enable_metadata_routing=True)

X, y = load_iris(return_X_y=True, as_frame=True)
w = np.arange(len(X)) + 1

reg = LinearRegression().set_fit_request(sample_weight=True)
pipeline_reg = LinearRegression().set_fit_request(sample_weight=True)

pipeline_fs = SelectFromModel(
    reg,
    threshold=-np.inf,
    prefit=False,
    max_features=len(X.columns),
)

pipeline = Pipeline(
    [
        ("feature_selector", pipeline_fs),
        ("regressor", pipeline_reg),
    ]
)

pipeline.fit(X, y, sample_weight=w)
reg.fit(X, y, sample_weight=w)

test_passed = (
    pipeline["feature_selector"].estimator_.coef_.tolist()
    == reg.coef_.tolist()
)

Expected Results

The expected result is test_passed = True.

i.e., the internal estimator of the pipeline's feature_selector should have coef_ that exactly match the coef_ from having a copied estimator fit on the same input (e.g., (X, y, sample_weight)).

Actual Results

The coefficients don't match between the pipeline's feature_selector.estimator_ and the copied estimator trained on the same input (X,y,sample_weight).

>>> pipeline["feature_selector"].estimator_.coef_.tolist() == reg.coef_.tolist()
False
>>> pipeline["feature_selector"].estimator_.coef_
array([-0.11190585, -0.04007949,  0.22864503,  0.60925205])
>>> reg.coef_
array([-0.14681895, -0.07652903,  0.28196639,  0.5732906 ])

Rather the coefficients of the pipeline's feature_selector.estimator_ matches those of a copied estimator fit only on (X,y) without sample_weight.

>>> reg.fit(X,y).coef_
array([-0.11190585, -0.04007949,  0.22864503,  0.60925205])

Versions

System:
    python: 3.11.9 (main, Apr  2 2024, 08:25:04) [Clang 15.0.0 (clang-1500.3.9.4)]
executable: /Users/kschluns/Library/Caches/pypoetry/virtualenvs/ds-sbraf-edgrceiw-py3.11/bin/python
   machine: macOS-15.1.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.0
          pip: 23.1.2
   setuptools: 75.6.0
        numpy: 1.26.4
        scipy: 1.14.1
       Cython: None
       pandas: 2.2.3
   matplotlib: 3.10.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/kschluns/Library/Caches/pypoetry/virtualenvs/ds-sbraf-edgrceiw-py3.11/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/kschluns/Library/Caches/pypoetry/virtualenvs/ds-sbraf-edgrceiw-py3.11/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.