[MRG+1] FEA Add Categorical Naive Bayes #12569
Conversation
|
There might be a problem, when the input array |
|
@timbicker After having thought a bit about it I think that setting the probability to 1 for an unseen category would make more sense than 0. Given a particular feature, an unseen category would then yield 1 for all classes and thus the feature would basically be ignored since other features could still determine the right class. If all categories of all feature are unseen the decision would be made automatically by the class probabilities. |
Yes, I agree. This makes more sense in my opinion. For the other case, that the category is unseen for a subset of the classes, we have the smoothing parameter. |
…to categorical_NB
If you uncomment, you will see that no error is raised. Instead, |
|
thanks, I'm unable to reproduce because I use a list. |
|
@qinhanmin2014 Thanks for your review. If you have no further comments, should @timbicker then change this PR to |
|
see the previous comment, please avoid calling check_array twice |
|
I am waiting for the other PR #14872 to be merged. Because otherwise, the tests of this PR would fail. Or should I fix it already? |
which tests? |
|
The output is as follows: We assume that |
|
It would be good to test other invariances: invariance under sample permutation, invariance under class label permutation up to ties, and maybe a test for how tie breaking is done to avoid regressions. Still to review main code |
|
Otherwise LGTM. |
|
@jnothman and @qinhanmin2014 thanks for your remarks |
4e9f97d
into
scikit-learn:master
|
Thanks and congratulations @timbicker and @FlorianWilhelm! |
|
@timbicker, great job! Thanks to everyone involved. |
|
can you please clarify , what you mean here and in under feature distributions per : discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution. seems to be you mean that distributions are estimated from data? as mentioned in you never use some know pdf to fit data for each feature? do you have example specific for categorical features data with real categorical data with several important features and several not important
thanks a lot in advance |
| @@ -49,6 +51,12 @@ def _joint_log_likelihood(self, X): | ||
| predict_proba and predict_log_proba. | ||
| """ | ||
|
|
||
| @abstractmethod | ||
| def _check_X(self, X): |
bsipocz
Dec 31, 2019
Contributor
Having this abstract method added is breaking downstream code.
Which is OK, but being the change hidden in this huge PR made it a bit more difficult than necessary to decipher/dig up if there is any relevant comments about the change, etc.
Having this abstract method added is breaking downstream code.
Which is OK, but being the change hidden in this huge PR made it a bit more difficult than necessary to decipher/dig up if there is any relevant comments about the change, etc.
jnothman
Dec 31, 2019
Member
Yes, we are under aware of impact on downstream projects. See #15992
I also think we neglect to consider people inheriting from our objects. Can you submit a PR to remove the abstractmethod for 0.22.1 perhaps?
Yes, we are under aware of impact on downstream projects. See #15992
I also think we neglect to consider people inheriting from our objects. Can you submit a PR to remove the abstractmethod for 0.22.1 perhaps?


Reference Issues/PRs
Impelements Categorical NB from #10856
What does this implement/fix? Explain your changes.
This implementation adds the NB classifier for categorical features.
Any other comments?