changing ARDRegression feature selection method #30951

guyko81 · Mar 6, 2025

Issues with current implementation:

When we scale a feature by factor k:
Lambda threshold approach:

The coefficient β becomes β/k
Lambda becomes k²·λ
If λ < threshold, then k²·λ may exceed threshold just due to scaling
Conclusion: Pruning depends on feature scale

Proposed approach based on Significance:

Significance = |β|·√λ = |β|/σ
When scaled: (|β|/k)·√(k²·λ) = |β|·√λ
Conclusion: Invariant to feature scaling

Advantages of Significance-Based Pruning

Statistical interpretation: Significance of 2.0 corresponds to ~95% confidence that the coefficient is non-zero - a standard statistical threshold
Consistency with theory: In statistics, we typically don't discard variables just because they have high precision; we discard them when we can't distinguish their effect from zero
Better feature selection:

Keeps small but certain effects (|β| small, λ large, |β|·√λ > threshold)
Removes large uncertain effects (|β| large, λ small, |β|·√λ < threshold)

Preservation of Bayesian framework: We're still using the same Bayesian update equations for λ and α, just changing the pruning criterion

Potential Concerns

Original purpose of threshold_lambda: The original ARD papers describe pruning based on lambda to promote sparsity, but they didn't explicitly consider scale dependence
Computation: Both approaches have identical computational costs
Sparse solutions: Significance might retain more features than lambda thresholding in some cases, but they're features we can be confident are relevant

…nd standardization

…e selection

github-actions · Mar 6, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: c2ba5df. Link to the linter CI: here}

adrinjalali

Overall I'm suspicious that this is something we'd like to do. The diff doesn't look straightforward, introduces new constructor args, and it all can be avoided with a scaling step in the pipeline.

So I'm more of -.5 on this, but I defer to others to see how they feel.

adrinjalali · Mar 18, 2025

sklearn/linear_model/_bayes.py

+        if self.standardize:
+            # Standardize X
+            self.scaler_ = StandardScaler()
+            X = self.scaler_.fit_transform(X)


I tend to prefer having scaling done in a pipeline before this step rather than here. I'm not sure if we've changed our opinion on that though (cc @lorentzenchr @ogrisel )

adrinjalali · Mar 18, 2025

sklearn/linear_model/_bayes.py

-    threshold_lambda : float, default=10 000
-        Threshold for removing (pruning) weights with high precision from
-        the computation.
+    min_significance : float, default=0.5
+        Minimum statistical significance (|beta|/sigma) required to keep a feature.
+        Default of 0.5 provides a reasonable balance between feature selection
+        and model accuracy.
+        This replaces the threshold_lambda parameter for more interpretable feature
+        pruning.
+
+    standardize : bool, default=True
+        Whether to standardize features before fitting. Recommended for
+        consistent feature selection behavior regardless of feature scales.


this needs a lot more careful consideration. We don't simply remove constructor arguments.

yes, I realized that since, the new solution wouldn't be backward compatible this way.

guyko81 added 2 commits March 6, 2025 14:28

ENH Improve ARDRegression with significance-based feature selection a…

fcaf75b

…nd standardization

Update ARDRegression with improved scaling handling and robust featur…

3692099

…e selection

github-actions bot added the module:linear_model label Mar 6, 2025

guyko81 and others added 2 commits March 6, 2025 19:53

linting formatting

4926c69

Merge branch 'main' into main

c2ba5df

adrinjalali reviewed Mar 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changing ARDRegression feature selection method #30951

changing ARDRegression feature selection method #30951

guyko81 commented Mar 6, 2025

github-actions bot commented Mar 6, 2025 •

edited

Loading

adrinjalali left a comment

adrinjalali Mar 18, 2025

adrinjalali Mar 18, 2025

guyko81 Mar 18, 2025

Search code, repositories, users, issues, pull requests...

changing ARDRegression feature selection method #30951

Are you sure you want to change the base?

changing ARDRegression feature selection method #30951

Conversation

guyko81 commented Mar 6, 2025

github-actions bot commented Mar 6, 2025 • edited Loading

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali Mar 18, 2025

Choose a reason for hiding this comment

adrinjalali Mar 18, 2025

Choose a reason for hiding this comment

guyko81 Mar 18, 2025

Choose a reason for hiding this comment

github-actions bot commented Mar 6, 2025 •

edited

Loading