Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

RFC Sample weight invariance properties #15657

Copy link
Copy link
Open
@rth

Description

@rth
Issue body actions

This can wait after the release.

A discussion happened in the GLM PR #14300 about what properties we would like sample_weight to have.

Current Versions

First, a short side comment about 3 ways simple weights (s_i) are currently used in loss functions with regularized generalized linear models in scikit-learn (as far as I understand),

  • Version 1a: $L_{1a}(\omega) = \sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$

    For instance: Ridge (also LogisticRegression where C=1/α)

  • Version 2a: $L_{2a}(\omega) = \frac{1}{n_{\text{samples}}}\sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$

    For instance: SGDClassifier? (maybe Lasso, ElasticNet once they are added?)

  • Version 2b: $L_{2b}(\omega) = \frac{1}{\sum_i s_i}\sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$

    For instance, currently proposed in the GLM PR for PoissonRegressor etc (edit: meanwhile implemented this way)

Properties

For sample weight it's useful to think in term of invariant properties, as they can be directly expressed in common tests. For instance,

  1. checking that zero sample weight is equivalent to ignoring samples in add common test that zero sample weight means samples are ignored #15015 (replaced by Common check for sample weight invariance with removed samples #17176) helped discovering a number of issues.
    All of the above formulations should verify this. It is verified only by L_1a and L_2b.

Similarly, paraphrasing #14300 (comment) other properties we might want to enforce, are,

  1. multiplying some sample weight by N is equivalent to repeating the corresponding samples N times.
    It is verified only by L_1a and L_2b.
    Example: For L_2a setting all weights to 2, is equivalent to having 2x more samples only if α = α / 2.

  2. Finally, that scaling sample weight has no effect. This is only verified by L_2b. For both L_1a and L_2a multiplying all samples weights by k is equivalent to setting α = α / k.

    This one is more controversial. Against enforcing this,

    in favor,

    • that we don't want a coupling between using samples weight and regularization.
      Example: Say one has a model without sample weights, and one wants to see if applying samples weights (imbalanced dataset, sample uncertainty, etc) improves it. Without this property it's difficult to conclude: is the evaluation metric better with sample weights, due to those, or simply because we now have a better regularized model? One has to simultaneously consider these two factors.

Whether we want/need consistency between the use of sample weight in metrics in estimators is another question. I'm not convinced we do, since in most cases estimators don't care about the global scaling of the loss function, and these formulations are equivalent up to a scaling of the regularization parameter. So maybe using the L_1a equivalent expression in metrics could be fine.

In any case, we need to decide the behavior we want. This is a blocker for,

Note: Ridge actually seem to have a different sample weight behavior for dense and sparse as reported in #15438

@agramfort 's option on this can be found in #15651 (comment) (if I understood correctly).

Please correct if I missed something (this could also use a more in depth review of how it is done in other libraries).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.