Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Fix dataset parsing for categories#676

Merged
janvanrijn merged 2 commits intodevelopopenml/openml-python:developfrom
fix_dataset_handlingopenml/openml-python:fix_dataset_handlingCopy head branch name to clipboard
Apr 16, 2019
Merged

Fix dataset parsing for categories#676
janvanrijn merged 2 commits intodevelopopenml/openml-python:developfrom
fix_dataset_handlingopenml/openml-python:fix_dataset_handlingCopy head branch name to clipboard

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Apr 16, 2019

The recent PR #548 changed the behavior of retrieving the categories of a categorical variable by inducing it from the data. In case the data does not contain a category, we loose information about its potential existence. This PR uses the category information provided by the arff file to determine all legal categories of a categorical variable.

@mfeurer mfeurer requested review from PGijsbers and janvanrijn April 16, 2019 10:48
Copy link
Member

@janvanrijn janvanrijn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, the unit test drives the point home.

Can you please add a single line comment to specify why this is a 2-liner, so we know in future? Can be merged afterwards.

@codecov-io
Copy link

codecov-io commented Apr 16, 2019

Codecov Report

Merging #676 into develop will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #676      +/-   ##
===========================================
+ Coverage    90.89%   90.95%   +0.05%     
===========================================
  Files           36       36              
  Lines         3558     3736     +178     
===========================================
+ Hits          3234     3398     +164     
- Misses         324      338      +14
Impacted Files Coverage 螖
openml/datasets/dataset.py 88.55% <100%> (+0.03%) 猬嗭笍
openml/runs/run.py 90.15% <100%> (酶) 猬嗭笍
openml/testing.py 95.61% <0%> (+0.28%) 猬嗭笍
openml/extensions/sklearn/extension.py 90.28% <0%> (+0.36%) 猬嗭笍

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update ab8a966...8726b6c. Read the comment docs.

@janvanrijn
Copy link
Member

I can attest that all tests are green btw

@mfeurer
Copy link
Collaborator Author

mfeurer commented Apr 16, 2019

Okay, I did so.

@mfeurer mfeurer requested a review from janvanrijn April 16, 2019 12:49
@janvanrijn janvanrijn merged commit 4152f91 into develop Apr 16, 2019
@janvanrijn janvanrijn deleted the fix_dataset_handling branch April 16, 2019 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.