Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit d97047f

Browse filesBrowse files
rprkhjeremiedbb
authored andcommitted
DOC improve stop_words description w.r.t. max_df range in CountVectorizer (#25489)
1 parent 2eebc5f commit d97047f
Copy full SHA for d97047f

File tree

1 file changed

+6
-6
lines changed
Filter options

1 file changed

+6
-6
lines changed

‎sklearn/feature_extraction/text.py

Copy file name to clipboardExpand all lines: sklearn/feature_extraction/text.py
+6-6Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -996,9 +996,9 @@ class CountVectorizer(_VectorizerMixin, BaseEstimator):
996996
will be removed from the resulting tokens.
997997
Only applies if ``analyzer == 'word'``.
998998
999-
If None, no stop words will be used. max_df can be set to a value
1000-
in the range [0.7, 1.0) to automatically detect and filter stop
1001-
words based on intra corpus document frequency of terms.
999+
If None, no stop words will be used. In this case, setting `max_df`
1000+
to a higher value, such as in the range (0.7, 1.0), can automatically detect
1001+
and filter stop words based on intra corpus document frequency of terms.
10021002
10031003
token_pattern : str or None, default=r"(?u)\\b\\w\\w+\\b"
10041004
Regular expression denoting what constitutes a "token", only used
@@ -1833,9 +1833,9 @@ class TfidfVectorizer(CountVectorizer):
18331833
will be removed from the resulting tokens.
18341834
Only applies if ``analyzer == 'word'``.
18351835
1836-
If None, no stop words will be used. max_df can be set to a value
1837-
in the range [0.7, 1.0) to automatically detect and filter stop
1838-
words based on intra corpus document frequency of terms.
1836+
If None, no stop words will be used. In this case, setting `max_df`
1837+
to a higher value, such as in the range (0.7, 1.0), can automatically detect
1838+
and filter stop words based on intra corpus document frequency of terms.
18391839
18401840
token_pattern : str, default=r"(?u)\\b\\w\\w+\\b"
18411841
Regular expression denoting what constitutes a "token", only used

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.