-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
API make naive Bayes API consistent in regards to "priors" #27135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
API make naive Bayes API consistent in regards to "priors" #27135
Conversation
On a related note, there are differences in both setting and getting priors to/from estimators, compare:
The docstring says that scikit-learn/sklearn/naive_bayes.py Lines 1107 to 1108 in 7f9bad9
I could not find in source code where the smoothing is applied to priors or counts of labels (as opposed to features). It is possible that the "smoothed" bit is included by mistake (or that I missed something). |
Indeed, I added
Edit: I did not see any smoothing as well. Not sure where this comes from. |
@@ -577,30 +657,6 @@ def _check_X_y(self, X, y, reset=True): | ||
"""Validate X and y in fit methods.""" | ||
return self._validate_data(X, y, accept_sparse="csr", reset=reset) | ||
|
||
def _update_class_log_prior(self, class_prior=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function moved in _BaseNB
since it could be used by all naive Bayes algorithm
@@ -439,24 +530,6 @@ def _partial_fit(self, X, y, classes=None, _refit=False, sample_weight=None): | ||
self.var_ = np.zeros((n_classes, n_features)) | ||
|
||
self.class_count_ = np.zeros(n_classes, dtype=np.float64) | ||
|
||
# Initialise the class prior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now factorize across all specialized naive Bayes.
else: | ||
self._priors = "uniform" | ||
|
||
def _update_class_log_prior(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comes from the _BaseDiscreteNB
and slightly changed to use the _priors
attribute.
def __init__(self, *, priors=None): | ||
self.priors = priors | ||
|
||
def _validate_priors(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main changes to handle smoothly the deprecation in the discrete naive Bayes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs a doc build trigger to find all places in documentation where we need to change.
- if `"uniform"`: a uniform prior is used for each class; | ||
- if `"empirical"`: the prior uses the class proportions from the | ||
training data; | ||
- if `None`: equivalent to `"empirical"`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we not change the default and deprecate None
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, None
allows me to find if a user is changing both None
or class_prior
/fit_prior
. If I set the default to "empirical"
, I cannot detect if the user explicitly set to "empirical"
or just using the default value.
It looks like we need to go with those long deprecation that introduce None
for removing the parameter and in 1.6, we deprecate None
and remove it in 1.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could have an undocumented hidden value, None
or "DEFAULT"
for you to know if the user is explicitly passing anything. We don't have to expose that value to users, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a side note. Detecting a non-default (user-set) value is also relevant to glemaitre's suggestion of automatic propagation of priors from the meta-estimator in #22574 .
I was thinking if the name
|
This PR intends to make the API of the class priors consistent across the different naive Bayes classifiers.
It deprecates
class_prior
andfit_prior
in favor of a single parameterpriors
that is consistent withGaussianNB
(introduced later) andLinearDiscriminantAnalysis
.I also introduce
class_log_prior_
inGaussianNB
to be consistent with other estimators.This change will also be helpful for the development of #22574. We can consistently set the
priors
attribute for all naive Bayes implementations and define a predictable API.