The Wayback Machine - https://web.archive.org/web/20210825220458/https://github.com/scikit-learn/scikit-learn/pull/16451
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sklearn logger default StreamHandler #16451

Merged
merged 5 commits into from May 4, 2020
Merged

Conversation

@cdeil
Copy link
Contributor

@cdeil cdeil commented Feb 16, 2020

This PR removes the StreamHander which was attached to the sklearn logger on import.

Relevant: #15122 and #15122 (comment) which suggests that this was added for Python 2.
Please note that since Python 3.2, a logging.lastResort default handler was added to the logging standard library, which means that by default your messages will show up for users, even if you don't add this streamhandler.

Note that the standard advice for Python libraries is to not attach handlers:
https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

Why is it bad to have the handler?

Because it causes log messages to appear twice, if people add a root logger, which is added by calling e.g. logging.info, like this:

>>> import sklearn
>>> import logging
>>> logging.info("123")
>>> sklearn.logger.info("123")
123
INFO:sklearn:123

Note that the problem of duplicate log messages is an old one, here's a blog post on the issue, and nice tool to figure how how logging works, and which libraries do what:
https://rhodesmill.org/brandon/2012/logging_tree/

Thoughts?

To avoid duplicate log messages
Copy link
Member

@jnothman jnothman left a comment

It seems I had a review comment lying around for a long time... Maybe we can get this into the upcoming release.

@@ -20,7 +20,6 @@
from ._config import get_config, set_config, config_context

logger = logging.getLogger(__name__)
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.INFO)

This comment has been minimized.

@jnothman

jnothman Apr 27, 2020
Member

Surely we shouldn't be doing this either?

This comment has been minimized.

@adrinjalali

adrinjalali Apr 27, 2020
Member

agreed.

This comment has been minimized.

@jnothman

jnothman Apr 30, 2020
Member

I'm confused what effect this has on the user now. Does setting the level on a logger that has no handler do anything useful?

This comment has been minimized.

@amueller

amueller Apr 30, 2020
Member

seems like consensus :)

This comment has been minimized.

@adrinjalali

adrinjalali Apr 30, 2020
Member

Should we then remove this one @thomasjpfan ?

This comment has been minimized.

@thomasjpfan

thomasjpfan May 4, 2020
Member

Ah yes, we should remove the setLevel as well, updating PR.

@jnothman jnothman added this to the 0.23 milestone Apr 27, 2020
@jnothman jnothman removed this from the 0.23 milestone Apr 27, 2020
@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 27, 2020

Will removing the handler silence the logging output of the dataset module?

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 27, 2020

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 27, 2020

Since the logging.lastResort logging is set to output to sys.stderr with level WARNING, the logger used in datasets module will not output anything unless they are at the WARNING level.

Currently, they are all at debug or info.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 27, 2020

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 28, 2020

The "quick" way to work through this is to add a verbose option to the fetch_* datasets and change the logging to use print.

It would break backwards compatibility with users that are actively using the logging API and capturing the logs from the fetch_* calls. (I think this is kind of rare.)

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Apr 28, 2020

we really should move the other way around and do more logger rather than going the print way I'd say.

@cdeil
Copy link
Contributor Author

@cdeil cdeil commented Apr 28, 2020

Maybe leaving the INFO level default as in this PR is the best thing to do, to keep the info messages that you want users to see by default. I think removing the handler should be OK, i.e. messages should appear by default on the console without it.

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 28, 2020

Honestly, I am okay with removing the handler and not having anything print out. We do not have these messages print out when downloading using fetch_openml and I do not think we have any complains.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 29, 2020

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Apr 29, 2020

Updated PR with whats new entry stating that a handler should be added to continue to receive messages from the datasets module.

- |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to
receive logging messages from :mod:`sklearn.datasets`, a handler should
be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`.
Comment on lines 190 to 192

This comment has been minimized.

@cdeil

cdeil Apr 29, 2020
Author Contributor

I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console. If the user wants to change logging, they can add handlers to the sklear.logger, but e.g. also the root logger.

Maybe not try to explain how logging handlers work via the what's new entry, and to just explain why the handler was removed?

Suggested change
- |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to
receive logging messages from :mod:`sklearn.datasets`, a handler should
be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`.
- |API| The `StreamHandler` was removed from `sklearn.logger` to avoid
double logging of messages in common cases where a hander is attached
to the root logger, and to follow the Python logging documentation
recommendation for libraries to leave the log message handling to
users and application code. :pr:`16451` by :user:`Christoph Deil <cdeil>`.

This comment has been minimized.

@thomasjpfan

thomasjpfan Apr 29, 2020
Member

I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console.

The last resort handler will log warnings to stderr. The datasets module logs with info and debug levels, and nothing is shown on the console.

Although, I am okay with updating the whats new to your suggestion.

This comment has been minimized.

@thomasjpfan

thomasjpfan Apr 29, 2020
Member

For example:

from sklearn.datasets import fetch_kddcup99
fetch_kddcup99(data_home="data")

will print Downloading https://ndownloader.figshare.com/files/5976042 on master, and nothing with this PR.

This comment has been minimized.

@cdeil

cdeil Apr 29, 2020
Author Contributor

I see. So here the sklearn.datasets._kddcup99 logger is created, it has the default level of warning and thus the info message isn't shown.

You could add a logger.setLevel(logging.INFO) there if you want it to appear by default for users.

My understanding is that messing with handlers in libraries is bad (causes confusion and double log messages for users), but messing with log levels is OK in libraries, to get the default behaviour you want.

Another option could be to use the sklearn.logger there to emit the message, but then I think you have to be very careful to avoid circular imports, probably is tricky.

This comment has been minimized.

@thomasjpfan

thomasjpfan Apr 29, 2020
Member

The lastresort handler will still filter everything out.

The only want to get the logging messages to appear is to change the logger to logger.warning.

Anyways, I am okay with removing the messages so we can be a better citizen of using logging.

This comment has been minimized.

@amueller

amueller Apr 30, 2020
Member

Hm that seems fine as info and debug seem to be the correct logging levels. We could set the level to 'warning' if we think that downloading the data is unexpect, but 'info' seems right and so it shouldn't be printed by default.

@adrinjalali adrinjalali merged commit a0c76ce into scikit-learn:master May 4, 2020
18 checks passed
18 checks passed
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
@circleci-artifacts-redirector
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
@codecov
codecov/patch Coverage not affected when comparing 9f04837...ff2ee13
Details
@codecov
codecov/project 98.04% (+0.00%) compared to 9f04837
Details
@azure-pipelines
scikit-learn.scikit-learn Build #20200504.20 succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py36_conda_openblas) Linux py36_conda_openblas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux py36_ubuntu_atlas) Linux py36_ubuntu_atlas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux32 py36_ubuntu_atlas_32bit) Linux32 py36_ubuntu_atlas_32bit succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py36_pip_openblas_32bit) Windows py36_pip_openblas_32bit succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@azure-pipelines
scikit-learn.scikit-learn (macOS pylatest_conda_mkl_no_openmp) macOS pylatest_conda_mkl_no_openmp succeeded
Details
adrinjalali added a commit to adrinjalali/scikit-learn that referenced this pull request May 4, 2020
* Remove sklearn logger default StreamHandler

To avoid duplicate log messages

* DOC Adds whats_new

* CLN Address comments

* MNT Remove setLevel

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
adrinjalali added a commit that referenced this pull request May 5, 2020
* Remove sklearn logger default StreamHandler

To avoid duplicate log messages

* DOC Adds whats_new

* CLN Address comments

* MNT Remove setLevel

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
gio8tisu added a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
* Remove sklearn logger default StreamHandler

To avoid duplicate log messages

* DOC Adds whats_new

* CLN Address comments

* MNT Remove setLevel

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
viclafargue added a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
* Remove sklearn logger default StreamHandler

To avoid duplicate log messages

* DOC Adds whats_new

* CLN Address comments

* MNT Remove setLevel

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.