Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Cache dataset features and qualities as pickle#979

Merged
PGijsbers merged 6 commits intodevelopopenml/openml-python:developfrom
add_908openml/openml-python:add_908Copy head branch name to clipboard
Nov 3, 2020
Merged

Cache dataset features and qualities as pickle#979
PGijsbers merged 6 commits intodevelopopenml/openml-python:developfrom
add_908openml/openml-python:add_908Copy head branch name to clipboard

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Oct 28, 2020

Reference Issue

#908

What does this PR implement/fix? Explain your changes.

This PR changes the loading behavior of loading dataset qualities and dataset features. They are now loaded inside the OpenMLDataset class. Furthermore, when loaded first, the parsed XML is saved as a pickle, which is loaded upon future invocations.

How should this PR be tested?

New unittests in test/test_datasets/test_dataset.py

Any other comments?

This PR improves the loading speed of datasets. Obviously, the files which are cached as pickle are only large if the datasets have a lot of features. The most drastic improvement I could observe was for the dataset dorothea, where the loading time reduced from ~13s to ~0.2s.

@mfeurer mfeurer requested a review from PGijsbers October 28, 2020 20:24
Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes, but also the unit tests fail so need to be fixed.

openml/datasets/dataset.py Show resolved Hide resolved
openml/datasets/dataset.py Outdated Show resolved Hide resolved
openml/datasets/dataset.py Outdated Show resolved Hide resolved
openml/datasets/dataset.py Outdated Show resolved Hide resolved
openml/datasets/functions.py Show resolved Hide resolved
openml/datasets/functions.py Outdated Show resolved Hide resolved
openml/datasets/functions.py Outdated Show resolved Hide resolved
tests/test_datasets/test_dataset.py Outdated Show resolved Hide resolved
tests/test_datasets/test_dataset.py Outdated Show resolved Hide resolved
@PGijsbers
Copy link
Collaborator

Can merge if CI passes (at least some of them, to make sure the merge conflict was resolved correctly 馃槗 )

@PGijsbers PGijsbers merged commit 560e952 into develop Nov 3, 2020
@PGijsbers PGijsbers deleted the add_908 branch November 3, 2020 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.