Description
This is revisiting #4487 in which @jakevdp suggested changing the default of bins
to 'auto'.
Since automatic determination is now supported in matplotlib via numpy, I think it would be great to make it the default.
The main reason for wanting the change is that many people use this for data analysis, and the behavior of bins=10
is pretty terrible in many cases (see Jake's example, still many people use the defaults.
Good defaults matter. I'd love to keep educating people but no amount of educating will prevent people from using the defaults (we found this true in sklearn when mining github).
Many people use this from pandas and the actual implementation is in numpy, and @jklymak makes the case that matplotlib ideally delegates as much to numpy as possible. I am very sympathetic to this position.
My main claim is that somewhere the default should change.
Currently my position is that matplotlib is the best place for that. I don't think having pandas change the default would be as good as it would lead to inconsistencies between pandas and matplotlib. I would be happy with numpy changing the default, but the use cases of numpy are not necessarily related to visualization or even data analysis at all, so it's less clear to me that 'auto' is a good default there.
Also, from my perspective (and yours might be different), changing the default in numpy is more likely to break people's code and might require code changes, so the case for changing there needs to be really strong, and I think it's weaker than for matplotlib.
If you have good reasons to suggest changing the defaults in numpy, I'm happy for us all to figure this out together (data science user + numpy + matplotlib). But right now, the default behavior leads to people making bad inferences.