@@ -996,9 +996,9 @@ class CountVectorizer(_VectorizerMixin, BaseEstimator):
996
996
will be removed from the resulting tokens.
997
997
Only applies if ``analyzer == 'word'``.
998
998
999
- If None, no stop words will be used. max_df can be set to a value
1000
- in the range [ 0.7, 1.0) to automatically detect and filter stop
1001
- words based on intra corpus document frequency of terms.
999
+ If None, no stop words will be used. In this case, setting `max_df`
1000
+ to a higher value, such as in the range ( 0.7, 1.0), can automatically detect
1001
+ and filter stop words based on intra corpus document frequency of terms.
1002
1002
1003
1003
token_pattern : str or None, default=r"(?u)\\b\\w\\w+\\b"
1004
1004
Regular expression denoting what constitutes a "token", only used
@@ -1833,9 +1833,9 @@ class TfidfVectorizer(CountVectorizer):
1833
1833
will be removed from the resulting tokens.
1834
1834
Only applies if ``analyzer == 'word'``.
1835
1835
1836
- If None, no stop words will be used. max_df can be set to a value
1837
- in the range [ 0.7, 1.0) to automatically detect and filter stop
1838
- words based on intra corpus document frequency of terms.
1836
+ If None, no stop words will be used. In this case, setting `max_df`
1837
+ to a higher value, such as in the range ( 0.7, 1.0), can automatically detect
1838
+ and filter stop words based on intra corpus document frequency of terms.
1839
1839
1840
1840
token_pattern : str, default=r"(?u)\\b\\w\\w+\\b"
1841
1841
Regular expression denoting what constitutes a "token", only used
0 commit comments