Remove sklearn logger default StreamHandler #16451
Conversation
To avoid duplicate log messages
|
It seems I had a review comment lying around for a long time... Maybe we can get this into the upcoming release. |
| @@ -20,7 +20,6 @@ | ||
| from ._config import get_config, set_config, config_context | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
| logger.addHandler(logging.StreamHandler()) | ||
| logger.setLevel(logging.INFO) |
jnothman
Apr 27, 2020
Member
Surely we shouldn't be doing this either?
Surely we shouldn't be doing this either?
adrinjalali
Apr 27, 2020
Member
agreed.
agreed.
jnothman
Apr 30, 2020
Member
I'm confused what effect this has on the user now. Does setting the level on a logger that has no handler do anything useful?
I'm confused what effect this has on the user now. Does setting the level on a logger that has no handler do anything useful?
amueller
Apr 30, 2020
Member
seems like consensus :)
seems like consensus :)
thomasjpfan
May 4, 2020
Member
Ah yes, we should remove the setLevel as well, updating PR.
Ah yes, we should remove the setLevel as well, updating PR.
|
Will removing the handler silence the logging output of the |
|
It should not on python >= 3.2, due to the last resort logger, but we can
check.
|
|
Since the Currently, they are all at |
|
Hmm so maybe we need docs to supplement this change.
|
|
The "quick" way to work through this is to add a It would break backwards compatibility with users that are actively using the logging API and capturing the logs from the |
|
we really should move the other way around and do more logger rather than going the |
|
Maybe leaving the INFO level default as in this PR is the best thing to do, to keep the info messages that you want users to see by default. I think removing the handler should be OK, i.e. messages should appear by default on the console without it. |
|
Honestly, I am okay with removing the handler and not having anything print out. We do not have these messages print out when downloading using |
|
Yes, let's hide them by default and at most make a what's new note that
users should set the logging level if they want these messages to continue
|
|
Updated PR with whats new entry stating that a handler should be added to continue to receive messages from the |
| - |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to | ||
| receive logging messages from :mod:`sklearn.datasets`, a handler should | ||
| be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`. |
cdeil
Apr 29, 2020
Author
Contributor
I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console. If the user wants to change logging, they can add handlers to the sklear.logger, but e.g. also the root logger.
Maybe not try to explain how logging handlers work via the what's new entry, and to just explain why the handler was removed?
Suggested change
- |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to
receive logging messages from :mod:`sklearn.datasets`, a handler should
be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`.
- |API| The `StreamHandler` was removed from `sklearn.logger` to avoid
double logging of messages in common cases where a hander is attached
to the root logger, and to follow the Python logging documentation
recommendation for libraries to leave the log message handling to
users and application code. :pr:`16451` by :user:`Christoph Deil <cdeil>`.
I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console. If the user wants to change logging, they can add handlers to the sklear.logger, but e.g. also the root logger.
Maybe not try to explain how logging handlers work via the what's new entry, and to just explain why the handler was removed?
| - |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to | |
| receive logging messages from :mod:`sklearn.datasets`, a handler should | |
| be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`. | |
| - |API| The `StreamHandler` was removed from `sklearn.logger` to avoid | |
| double logging of messages in common cases where a hander is attached | |
| to the root logger, and to follow the Python logging documentation | |
| recommendation for libraries to leave the log message handling to | |
| users and application code. :pr:`16451` by :user:`Christoph Deil <cdeil>`. |
thomasjpfan
Apr 29, 2020
Member
I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console.
The last resort handler will log warnings to stderr. The datasets module logs with info and debug levels, and nothing is shown on the console.
Although, I am okay with updating the whats new to your suggestion.
I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console.
The last resort handler will log warnings to stderr. The datasets module logs with info and debug levels, and nothing is shown on the console.
Although, I am okay with updating the whats new to your suggestion.
thomasjpfan
Apr 29, 2020
Member
For example:
from sklearn.datasets import fetch_kddcup99
fetch_kddcup99(data_home="data")
will print Downloading https://ndownloader.figshare.com/files/5976042 on master, and nothing with this PR.
For example:
from sklearn.datasets import fetch_kddcup99
fetch_kddcup99(data_home="data")will print Downloading https://ndownloader.figshare.com/files/5976042 on master, and nothing with this PR.
cdeil
Apr 29, 2020
Author
Contributor
I see. So here the sklearn.datasets._kddcup99 logger is created, it has the default level of warning and thus the info message isn't shown.
You could add a logger.setLevel(logging.INFO) there if you want it to appear by default for users.
My understanding is that messing with handlers in libraries is bad (causes confusion and double log messages for users), but messing with log levels is OK in libraries, to get the default behaviour you want.
Another option could be to use the sklearn.logger there to emit the message, but then I think you have to be very careful to avoid circular imports, probably is tricky.
I see. So here the sklearn.datasets._kddcup99 logger is created, it has the default level of warning and thus the info message isn't shown.
You could add a logger.setLevel(logging.INFO) there if you want it to appear by default for users.
My understanding is that messing with handlers in libraries is bad (causes confusion and double log messages for users), but messing with log levels is OK in libraries, to get the default behaviour you want.
Another option could be to use the sklearn.logger there to emit the message, but then I think you have to be very careful to avoid circular imports, probably is tricky.
thomasjpfan
Apr 29, 2020
Member
The lastresort handler will still filter everything out.
The only want to get the logging messages to appear is to change the logger to logger.warning.
Anyways, I am okay with removing the messages so we can be a better citizen of using logging.
The lastresort handler will still filter everything out.
The only want to get the logging messages to appear is to change the logger to logger.warning.
Anyways, I am okay with removing the messages so we can be a better citizen of using logging.
amueller
Apr 30, 2020
Member
Hm that seems fine as info and debug seem to be the correct logging levels. We could set the level to 'warning' if we think that downloading the data is unexpect, but 'info' seems right and so it shouldn't be printed by default.
Hm that seems fine as info and debug seem to be the correct logging levels. We could set the level to 'warning' if we think that downloading the data is unexpect, but 'info' seems right and so it shouldn't be printed by default.
a0c76ce
into
scikit-learn:master
* Remove sklearn logger default StreamHandler To avoid duplicate log messages * DOC Adds whats_new * CLN Address comments * MNT Remove setLevel Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
* Remove sklearn logger default StreamHandler To avoid duplicate log messages * DOC Adds whats_new * CLN Address comments * MNT Remove setLevel Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
* Remove sklearn logger default StreamHandler To avoid duplicate log messages * DOC Adds whats_new * CLN Address comments * MNT Remove setLevel Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
* Remove sklearn logger default StreamHandler To avoid duplicate log messages * DOC Adds whats_new * CLN Address comments * MNT Remove setLevel Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>


This PR removes the StreamHander which was attached to the sklearn logger on import.
Relevant: #15122 and #15122 (comment) which suggests that this was added for Python 2.
Please note that since Python 3.2, a logging.lastResort default handler was added to the logging standard library, which means that by default your messages will show up for users, even if you don't add this streamhandler.
Note that the standard advice for Python libraries is to not attach handlers:
https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library
Why is it bad to have the handler?
Because it causes log messages to appear twice, if people add a root logger, which is added by calling e.g.
logging.info, like this:Note that the problem of duplicate log messages is an old one, here's a blog post on the issue, and nice tool to figure how how logging works, and which libraries do what:
https://rhodesmill.org/brandon/2012/logging_tree/
Thoughts?