Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH Add new smoothing methods to MultinomialNB #12943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
Loading
from

Conversation

psendyk
Copy link
Contributor

@psendyk psendyk commented Jan 9, 2019

Reference Issues/PRs

Resolves #12862

What does this implement/fix? Explain your changes.

Implements the Good-Turing smoothing algorithm for Naive Bayes classifier and adds two other possible options.

Any other comments?

If the maintainers decide to add Jelinek-Mercer or Absolute Discounting, we need to use different default smoothing parameters depending on the selected algorithm.

Since Good-Turing uses raw counts, the code won't work if the input is transformed using, for example, tf-idf. Would throwing an error be a reasonable solution when the input is not raw counts?

At first, I implemented Simple Good-Turing operating on the entire matrix self.feature_count_ but I the current solution is more readable. For more information on the notation see Good-Turing Frequency Estimation Without Tears.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this is mostly issues of idiom. But you need to add tests, firstly to show that changing the parameter has the desired (or at least any) effect, secondly to test your implementation against known values from the literature or toy data.

I'm not persuaded that in the context of naive bayes classification we should expect substantial benefit from a wide array of smoothing choices. Ideally this would be supported by literature or examples to show that less naive smoothing can help substantially in multinomial naive bayes classification.

@@ -698,10 +704,16 @@ class MultinomialNB(BaseDiscreteNB):
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to
Information Retrieval. Cambridge University Press, pp. 234-265.
https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Gale, William A., and Geoffrey Sampson. 1996. Good-Turing Frequency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please roughly follow a consistent citation style, i.e. first name first.

(0 for no smoothing).
Smoothing parameter (0 for no smoothing).

smoothing : string, optional (default='additive')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will state the acceptable values

@@ -638,8 +640,11 @@ class MultinomialNB(BaseDiscreteNB):
Parameters
----------
alpha : float, optional (default=1.0)
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
Smoothing parameter (0 for no smoothing).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of this parameter in different smoothing methods should be noted here or under smoothing

@@ -728,6 +745,92 @@ def _joint_log_likelihood(self, X):
return (safe_sparse_dot(X, self.feature_log_prob_.T) +
self.class_log_prior_)

def _additive_smoothing(self, alpha):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid obscurity and follow the convention that functions correspond to verbs, let's use the naming convention _smooth_additive etc

"""Compute log probabilities using Simple Good-Turing smoothing"""
def sgt(fc):
# Get the frequencies of frequencies
n = dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not using numpy idiom. This should be using numpy data structures and numpy.sort where possible.

Use np.unique and np.bincount? Roughly:

freq_values, freq_idx = np.unique(fc, return_inverse=True)
freq_freqs = np.bincount(freq_idx, minlength=len(freq_values))

# Compute the Z values
r = np.array(sorted(n.items(), key=lambda keyval: keyval[0]),
dtype='int')[:, 0]
n = np.array(sorted(n.items(), key=lambda keyval: keyval[0]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid duplicating the sort

Z[r[0]] = 2*n[0] / r[1]
Z[r[-1]] = n[-1] / (r[-1] - r[-2])
for (idx, j) in enumerate(r):
if idx == 0 or idx >= len(r) - 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions can be specified in the for line by slicing r and using the second parameter of enumerate

Z = dict()
Z[r[0]] = 2*n[0] / r[1]
Z[r[-1]] = n[-1] / (r[-1] - r[-2])
for (idx, j) in enumerate(r):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this loop can be written as a vectorised numpy operation, e.g. Z[1:-1] = 2 * freq_freqs[1:-1] / (freq_values[2:] - freq_values[:-2]) or something??

p_r = (1-P_0)*(r_star/N_prime)

# Calculate probabilities for each feature
total_unseen = np.count_nonzero(fc == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use sum rather than count_nonzero. count_nonzero is implemented as sum(x != 0)


# Calculate probabilities for each feature
total_unseen = np.count_nonzero(fc == 0)
unseen_prob = P_0/total_unseen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use spaces around binary operators unless deleting spaces helps make the order of operations clearer

@oanise93
Copy link

I'm not persuaded that in the context of naive bayes classification we should expect substantial benefit from a wide array of smoothing choices. Ideally this would be supported by literature or examples to show that less naive smoothing can help substantially in multinomial naive bayes classification

How would you define "substantial" @jnothman? Looking at Improving Naive Bayes Text Classifier Using Smoothing Methods, it seems like alternate smoothing methods can perform a good bit better it depending on the feature size and the amount of training data.

@jnothman
Copy link
Member

jnothman commented Feb 16, 2019 via email

@oanise93
Copy link

Sorry, I misunderstood your comment. It seems sensible to provide justification for each individual smoothing method.

@amueller
Copy link
Member

amueller commented Aug 6, 2019

@oanise93 are you still working on this?

@oanise93
Copy link

oanise93 commented Aug 6, 2019

Sorry for the confusion @amueller I was merely commenting on the thread. I’m not working on this PR.

@HuStmpHrrr
Copy link

HuStmpHrrr commented Sep 22, 2019

I am a complete noob in this area. I suppose good-Turing is an smoothing method independent of learning algorithms, so should it be separated out to allow other learning methods to use it (like logistic regression)?

Base automatically changed from master to main January 22, 2021 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can Naive bayes in sklearn support good-turing smoothing?
6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.