Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Request: change hist bins default to 'auto' #16403

Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
@amueller

Description

@amueller
Issue body actions

This is revisiting #4487 in which @jakevdp suggested changing the default of bins to 'auto'.
Since automatic determination is now supported in matplotlib via numpy, I think it would be great to make it the default.

The main reason for wanting the change is that many people use this for data analysis, and the behavior of bins=10 is pretty terrible in many cases (see Jake's example, still many people use the defaults.
Good defaults matter. I'd love to keep educating people but no amount of educating will prevent people from using the defaults (we found this true in sklearn when mining github).

Many people use this from pandas and the actual implementation is in numpy, and @jklymak makes the case that matplotlib ideally delegates as much to numpy as possible. I am very sympathetic to this position.

My main claim is that somewhere the default should change.

Currently my position is that matplotlib is the best place for that. I don't think having pandas change the default would be as good as it would lead to inconsistencies between pandas and matplotlib. I would be happy with numpy changing the default, but the use cases of numpy are not necessarily related to visualization or even data analysis at all, so it's less clear to me that 'auto' is a good default there.

Also, from my perspective (and yours might be different), changing the default in numpy is more likely to break people's code and might require code changes, so the case for changing there needs to be really strong, and I think it's weaker than for matplotlib.

If you have good reasons to suggest changing the defaults in numpy, I'm happy for us all to figure this out together (data science user + numpy + matplotlib). But right now, the default behavior leads to people making bad inferences.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API: changesstatus: closed as inactiveIssues closed by the "Stale" Github Action. Please comment on any you think should still be open.Issues closed by the "Stale" Github Action. Please comment on any you think should still be open.status: inactiveMarked by the “Stale” Github ActionMarked by the “Stale” Github Actiontopic: hist

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.