Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DOC Fix the description of some features in load_diabetes #19366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions 6 sklearn/datasets/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -765,6 +765,12 @@ def load_diabetes(*, return_X_y=False, as_frame=False):
Features real, -.2 < x < .2
Targets integer 25 - 346
============== ==================

.. note::
The meaning of each feature (i.e. `feature_names`) might be unclear
(especially for `ltg`) as the documentation of the original dataset is
not explicit. We provide information that seems correct in regard with
the scientific literature in this field of research.

Read more in the :ref:`User Guide <diabetes_dataset>`.

Expand Down
6 changes: 3 additions & 3 deletions 6 sklearn/datasets/descr/diabetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ quantitative measure of disease progression one year after baseline.
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, T-Cells (a type of white blood cells)
- s1 tc, total serum cholesterol
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, thyroid stimulating hormone
- s5 ltg, lamotrigine
- s4 tch, total cholesterol / HDL
- s5 ltg, possibly log of serum triglycerides level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- s5 ltg, possibly log of serum triglycerides level
- s5 ltg, log of serum triglycerides level

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a warning to state that the reliability of the meaning of each feature is not as good as we would have liked because the documentation of the source dataset is not very explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a note in the API docs would be more appropriate

Copy link
Contributor Author

@hongshaoyang hongshaoyang Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note to sklearn.datasets.load_diabetes on how the meaning of each feature might not be clear.

- s6 glu, blood sugar level

Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Expand Down
Morty Proxy This is a proxified and sanitized view of the page, visit original site.