Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Use OS-specific cache directories for get_data_home and add tests #31438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
Loading
from

Conversation

Namit24
Copy link

@Namit24 Namit24 commented May 27, 2025

Fixes #31267
This PR updates the get_data_home function to use platform-specific cache directories for storing scikit-learn datasets:

On Linux, it uses $XDG_CACHE_HOME/scikit_learn_data if set, otherwise falls back to ~/.cache/scikit_learn_data.
On macOS, it uses ~/Library/Caches/scikit_learn_data.
On Windows, it uses %LOCALAPPDATA%\scikit_learn_data.
Maintains support for the SCIKIT_LEARN_DATA environment variable override.
Adds new tests to verify this platform-specific behavior.
Fixes some linting and import organization in the related files.
This improves compliance with OS standards for cache storage and prevents cluttering the user’s home directory.

Ready for review. Happy to address feedback or make adjustments as needed.

Copy link

github-actions bot commented May 27, 2025

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


benchmarks/bench_tsne_mnist.py:10:1: I001 [*] Import block is un-sorted or un-formatted
   |
 8 |   # SPDX-License-Identifier: BSD-3-Clause
 9 |
10 | / import argparse
11 | | import json
12 | | import os
13 | | import os.path as op
14 | | from time import time
15 | |
16 | | import numpy as np
17 | | from joblib import Memory
18 | | from sklearn.utils._openmp_helpers import _openmp_effective_n_threads
19 | |
20 | | from sklearn.datasets import fetch_openml
21 | | from sklearn.decomposition import PCA
22 | | from sklearn.manifold import TSNE
23 | | from sklearn.neighbors import NearestNeighbors
24 | | from sklearn.utils import check_array
25 | | from sklearn.utils import shuffle as _shuffle
   | |_____________________________________________^ I001
26 |
27 |   LOG_DIR = "mnist_tsne_output"
   |
   = help: Organize imports

Found 1 error.
[*] 1 fixable with the `--fix` option.

Generated for commit: f28dfd0. Link to the linter CI: here

@Namit24 Namit24 closed this May 27, 2025
@Namit24 Namit24 reopened this May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change the default data directory
1 participant
Morty Proxy This is a proxified and sanitized view of the page, visit original site.