Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Use OS-specific cache directories instead of home directory #31295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
Loading
from

Conversation

norgera
Copy link
Contributor

@norgera norgera commented May 2, 2025

Resolves #31267

The get_data_home function now uses standard OS cache directories:

  • Linux/Unix: $XDG_CACHE_HOME/scikit-learn (~/.cache/scikit-learn)
  • macOS: ~/Library/Caches/scikit-learn
  • Windows: %LOCALAPPDATA%/scikit-learn (~/AppData/Local/scikit-learn)

Previously, data was stored in ~/scikit_learn_data by default.
This change follows OS conventions for cache storage and improves
maintainability.

Implemented deprecation protocol and added tests in test_base.py

Copy link

github-actions bot commented May 2, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 31720d0. Link to the linter CI: here

@lucyleeow
Copy link
Member

I think this would warrant a new entry, see: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md for instructions.

There's quite a few other places that need to be updated, I would suggest searching for "scikit_learn_data" in the codebase.

@norgera
Copy link
Contributor Author

norgera commented May 3, 2025

I believe I have covered all the files now

@jeremiedbb
Copy link
Member

Thanks for the PR @norgera. This is a breaking change so I think it needs a deprecation cycle.

@glemaitre
Copy link
Member

Indeed. Here is a link on how we want to go through when it comes to deprecation: https://scikit-learn.org/dev/developers/contributing.html#deprecation

@norgera
Copy link
Contributor Author

norgera commented May 5, 2025

Thank you all for the information.

I've added deprecation to use the original home directory if its detected or manually set, providing a warning + instructions to move files to new cache folder. Otherwise uses new cache folder.

Let me know if this approach is appropriate.

Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a flyby note, current release is 1.7, so should be 'deprecated in 1.7, removed in 1.9'.

@betatim
Copy link
Member

betatim commented May 12, 2025

Thanks a lot for tackling this and also adding all the machinery for deprecation. Do we really need to do a deprecation (and introduce a new parameter)? To me the location of a cache directory feels like an internal detail of the data downloader. If the user specified a path explicitly and we somehow changed it then I'd agree we need a deprecation. But for a cache directory that is implicitly set I am less sure. But then again people might be using this to download datasets and then accessing the files they downloaded. So a better way to think of the path is as a default download location, not a cache directory (in which case we should do a deprecation).

But if we think of this as more of a data downloading tool where people can (and should) access the files - should we be using a directory that is explicitly called cache (which to me implies that it can be deleted at random or its structure changed without notice)? For example if my OS asks me to "clean up caches and temporary files?" I think it should always be safe to click yes. But for the use-case where someone uses the downloading machinery to place a file in a specific directory, that is no longer true?

Or am I making this too complicated? What do you all think about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change the default data directory
5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.