API make naive Bayes API consistent in regards to "priors" #27135

glemaitre · Aug 22, 2023

This PR intends to make the API of the class priors consistent across the different naive Bayes classifiers.

It deprecates class_prior and fit_prior in favor of a single parameter priors that is consistent with GaussianNB (introduced later) and LinearDiscriminantAnalysis.

I also introduce class_log_prior_ in GaussianNB to be consistent with other estimators.

This change will also be helpful for the development of #22574. We can consistently set the priors attribute for all naive Bayes implementations and define a predictable API.

github-actions · Aug 22, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 84292de. Link to the linter CI: here}

avm19 · Aug 22, 2023

On a related note, there are differences in both setting and getting priors to/from estimators, compare:

GaussianNB(priors=...).class_prior_ vs
BenoulliNB(class_prior=...).class_log_prior_.

The docstring says that class_log_prior_ is "smoothed":

scikit-learn/sklearn/naive_bayes.py

Lines 1107 to 1108 in 7f9bad9

    
               class_log_prior_ : ndarray of shape (n_classes,) 
        
                   Log probability of each class (smoothed).

I could not find in source code where the smoothing is applied to priors or counts of labels (as opposed to features). It is possible that the "smoothed" bit is included by mistake (or that I missed something).

glemaitre · Aug 22, 2023

Indeed, I added class_log_prior_ to GaussianNB. I refactor the code such that all call the same method to define the fitted attribute. I don't really intend to deprecate class_prior_.

I could not find in source code where the smoothing is applied to priors or counts of labels (as opposed to features). It is possible that the "smoothed" bit is included by mistake (or that I missed something).

I did not look at it yet. I would expect the alpha to play a role in the smoothing. But it should most probably impact the class_count_ otherwise there is not smoothing. I need to have a closer look.

Edit: I did not see any smoothing as well. Not sure where this comes from. feature_log_prob_ is the one including some smoothing due to the alpha parameter.

glemaitre · Aug 22, 2023

sklearn/naive_bayes.py

@@ -577,30 +657,6 @@ def _check_X_y(self, X, y, reset=True):
        """Validate X and y in fit methods."""
        return self._validate_data(X, y, accept_sparse="csr", reset=reset)

-    def _update_class_log_prior(self, class_prior=None):


This function moved in _BaseNB since it could be used by all naive Bayes algorithm

glemaitre · Aug 22, 2023

sklearn/naive_bayes.py

@@ -439,24 +530,6 @@ def _partial_fit(self, X, y, classes=None, _refit=False, sample_weight=None):
            self.var_ = np.zeros((n_classes, n_features))

            self.class_count_ = np.zeros(n_classes, dtype=np.float64)
-
-            # Initialise the class prior


This is now factorize across all specialized naive Bayes.

glemaitre · Aug 22, 2023

sklearn/naive_bayes.py

+            else:
+                self._priors = "uniform"
+
+    def _update_class_log_prior(self):


Comes from the _BaseDiscreteNB and slightly changed to use the _priors attribute.

glemaitre · Aug 22, 2023

sklearn/naive_bayes.py

+    def __init__(self, *, priors=None):
+        self.priors = priors
+
+    def _validate_priors(self):


This is the main changes to handle smoothly the deprecation in the discrete naive Bayes.

adrinjalali

This also needs a doc build trigger to find all places in documentation where we need to change.

adrinjalali · Aug 24, 2023

sklearn/naive_bayes.py

+        - if `"uniform"`: a uniform prior is used for each class;
+        - if `"empirical"`: the prior uses the class proportions from the
+          training data;
+        - if `None`: equivalent to `"empirical"`;


should we not change the default and deprecate None here?

Actually, None allows me to find if a user is changing both None or class_prior/fit_prior. If I set the default to "empirical", I cannot detect if the user explicitly set to "empirical" or just using the default value.

It looks like we need to go with those long deprecation that introduce None for removing the parameter and in 1.6, we deprecate None and remove it in 1.8.

we could have an undocumented hidden value, None or "DEFAULT" for you to know if the user is explicitly passing anything. We don't have to expose that value to users, do we?

Just a side note. Detecting a non-default (user-set) value is also relevant to glemaitre's suggestion of automatic propagation of priors from the meta-estimator in #22574 .

…sistence

avm19 · Sep 11, 2023

It deprecates class_prior and fit_prior in favor of a single parameter priors that is consistent with GaussianNB (introduced later) and LinearDiscriminantAnalysis.

I was thinking if the name class_prior is better than just prior? Here are my thoughts:

Afaic, the term "prior" is used by Bayesians mainly for a prior distribution of a parameter of another distribution, which is a hidden variable (e.g. $p(\alpha|\alpha_1, \alpha_2)$ is the prior for parameter alpha in Bayesian ridge regression). But here we have a prior distribution of the label, which is an observed variable. Although Naive Bayes is not a Bayesian method, consistency with Bayesians won't harm.
If P(Y) and P(Y|X) are referred to as a "prior" and a "posterior", then P(X) and P(X|Y) can also be called so. The name class_prior disambiguates between the class label Y and the covariates X. We would normally call P(X) and P(X|Y) a marginal and a class-conditional distribution, respectively, but nevertheless...

glemaitre added 2 commits August 22, 2023 18:36

MAINT make priors consistent across naive Bayes classifiers

afc08bf

whatsnew

13f18d5

glemaitre marked this pull request as draft August 22, 2023 16:45

github-actions bot added the module:naive_bayes label Aug 22, 2023

change pr number

25cd611

glemaitre mentioned this pull request Aug 22, 2023

ENH add Naive Bayes Metaestimator ColumnwiseNB (aka "GeneralNB") #22574

Open

glemaitre added 3 commits August 22, 2023 21:09

iter

f74e9b7

TST check inference priors

c9b3601

iter

068f8fa

iter

e0f10e4

glemaitre commented Aug 22, 2023

View reviewed changes

glemaitre added 2 commits August 22, 2023 21:40

revert config pytest

20d7bcc

typo doc

a5dacf6

glemaitre marked this pull request as ready for review August 22, 2023 20:05

adrinjalali reviewed Aug 24, 2023

View reviewed changes

glemaitre added 2 commits September 11, 2023 16:21

Merge remote-tracking branch 'origin/main' into naive_bayes_prior_con…

f51901a

…sistence

more coverage

84292de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API make naive Bayes API consistent in regards to "priors" #27135

API make naive Bayes API consistent in regards to "priors" #27135

Uh oh!

glemaitre commented Aug 22, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Aug 22, 2023 •

edited

Loading

Uh oh!

avm19 commented Aug 22, 2023

Uh oh!

glemaitre commented Aug 22, 2023 •

edited

Loading

Uh oh!

glemaitre Aug 22, 2023

Uh oh!

glemaitre Aug 22, 2023

Uh oh!

glemaitre Aug 22, 2023

Uh oh!

glemaitre Aug 22, 2023

Uh oh!

adrinjalali left a comment

Uh oh!

adrinjalali Aug 24, 2023

Uh oh!

glemaitre Sep 11, 2023

Uh oh!

adrinjalali Sep 13, 2023

Uh oh!

avm19 Sep 13, 2023 •

edited

Loading

Uh oh!

avm19 commented Sep 11, 2023

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

API make naive Bayes API consistent in regards to "priors" #27135

Are you sure you want to change the base?

API make naive Bayes API consistent in regards to "priors" #27135

Uh oh!

Conversation

glemaitre commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

avm19 commented Aug 22, 2023

Uh oh!

glemaitre commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali Aug 24, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

adrinjalali Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

avm19 Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avm19 commented Sep 11, 2023

Uh oh!

Uh oh!

glemaitre commented Aug 22, 2023 •

edited

Loading

github-actions bot commented Aug 22, 2023 •

edited

Loading

glemaitre commented Aug 22, 2023 •

edited

Loading

avm19 Sep 13, 2023 •

edited

Loading