Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit accd403

Browse filesBrowse files
authored
Merge pull request #7282 from phobson/MEP-bxp
Draft version of MEP28: Simplification of boxplots
2 parents 6595e4c + 5d2e6c3 commit accd403
Copy full SHA for accd403

File tree

Expand file treeCollapse file tree

2 files changed

+327
-0
lines changed
Filter options
Expand file treeCollapse file tree

2 files changed

+327
-0
lines changed

‎doc/devel/MEP/MEP28.rst

Copy file name to clipboard
+326Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
=============================================
2+
MEP 28: Remove Complexity from Axes.boxplot
3+
=============================================
4+
5+
.. contents::
6+
:local:
7+
8+
Status
9+
======
10+
11+
..
12+
.. MEPs go through a number of phases in their lifetime:
13+
14+
- **Discussion**
15+
..
16+
.. - **Progress**: Consensus was reached on the mailing list and
17+
.. implementation work has begun.
18+
..
19+
.. - **Completed**: The implementation has been merged into master.
20+
..
21+
.. - **Superseded**: This MEP has been abandoned in favor of another
22+
.. approach.
23+
24+
Branches and Pull requests
25+
==========================
26+
27+
Adding pre- & post-processing options to ``cbook.boxplot_stats``: https://github.com/phobson/matplotlib/tree/boxplot-stat-transforms
28+
Exposing ``cbook.boxplot_stats`` through ``Axes.boxplot`` kwargs: None
29+
Remove redundant statistical kwargs in ``Axes.boxplot``: None
30+
Remove redundant style options in ``Axes.boxplot``: None
31+
Remaining items that arise through discussion: None
32+
33+
Abstract
34+
========
35+
36+
Over the past few releases, the ``Axes.boxplot`` method has grown in
37+
complexity to support fully customizable artist styling and statistical
38+
computation. This lead to ``Axes.boxplot`` being split off into multiple
39+
parts. The statistics needed to draw a boxplot are computed in
40+
``cbook.boxplot_stats``, while the actual artists are drawn by ``Axes.bxp``.
41+
The original method, ``Axes.boxplot`` remains as the most public API that
42+
handles passing the user-supplied data to ``cbook.boxplot_stats``, feeding
43+
the results to ``Axes.bxp``, and pre-processing style information for
44+
each facet of the boxplot plots.
45+
46+
This MEP will outline a path forward to rollback the added complexity
47+
and simplify the API while maintaining reasonable backwards
48+
compatibility.
49+
50+
Detailed description
51+
====================
52+
53+
Currently, the ``Axes.boxplot`` method accepts parameters that allow the
54+
users to specify medians and confidence intervals for each box that
55+
will be drawn in the plot. These were provided so that avdanced users
56+
could provide statistics computed in a different fashion that the simple
57+
method provided by matplotlib. However, handling this input requires
58+
complex logic to make sure that the forms of the data structure match what
59+
needs to be drawn. At the moment, that logic contains 9 separate if/else
60+
statements nested up to 5 levels deep with a for loop, and may raise up to 2 errors.
61+
These parameters were added prior to the creation of the ``Axes.bxp`` method,
62+
which draws boxplots from a list of dictionaries containing the relevant
63+
statistics. Matplotlib also provides a function that computes these
64+
statistics via ``cbook.boxplot_stats``. Note that advanced users can now
65+
either a) write their own function to compute the stats required by
66+
``Axes.bxp``, or b) modify the output returned by ``cbook.boxplots_stats``
67+
to fully customize the position of the artists of the plots. With this
68+
flexibility, the parameters to manually specify only the medians and their
69+
confidences intervals remain for backwards compatibility.
70+
71+
Around the same time that the two roles of ``Axes.boxplot`` were split into
72+
``cbook.boxplot_stats`` for computation and ``Axes.bxp`` for drawing, both
73+
``Axes.boxplot`` and ``Axes.bxp`` were written to accept parameters that
74+
individually toggle the drawing of all components of the boxplots, and
75+
parameters that individually configure the style of those artists. However,
76+
to maintain backwards compatibility, the ``sym`` parameter (previously used
77+
to specify the symbol of the fliers) was retained. This parameter itself
78+
requires fairly complex logic to reconcile the ``sym`` parameters with the
79+
newer ``flierprops`` parameter at the default style specified by ``matplotlibrc``.
80+
81+
This MEP seeks to dramatically simplify the creation of boxplots for
82+
novice and advanced users alike. Importantly, the changes proposed here
83+
will also be available to downstream packages like seaborn, as seaborn
84+
smartly allows users to pass arbitrary dictionaries of parameters through
85+
the seaborn API to the underlying matplotlib functions.
86+
87+
This will be achieved in the following way:
88+
89+
1. ``cbook.boxplot_stats`` will be modified to allow pre- and post-
90+
computation transformation functions to be passed in (e.g., ``np.log``
91+
and ``np.exp`` for lognormally distributed data)
92+
2. ``Axes.boxplot`` will be modified to also accept and naïvely pass them
93+
to ``cbook.boxplots_stats`` (Alt: pass the stat function and a dict
94+
of its optional parameters).
95+
3. Outdated parameters from ``Axes.boxplot`` will be deprecated and
96+
later removed.
97+
98+
Implementation
99+
==============
100+
101+
Passing transform functions to ``cbook.boxplots_stats``
102+
-------------------------------------------------------
103+
104+
This MEP proposes that two parameters (e.g., ``transform_in`` and
105+
``transform_out`` be added to the cookbook function that computes the
106+
statistics for the boxplot function. These will be optional keyword-only
107+
arguments and can easily be set to ``lambda x: x`` as a no-op when omitted
108+
by the user. The ``transform_in`` function will be applied to the data
109+
as the ``boxplot_stats`` function loops through each subset of the data
110+
passed to it. After the list of statistics dictionaries are computed the
111+
``transform_out`` function is applied to each value in the dictionaries.
112+
113+
These transformations can then be added to the call signature of
114+
``Axes.boxplot`` with little impact to that method's complexity. This is
115+
because they can be directly passed to ``cbook.boxplot_stats``.
116+
Alternatively, ``Axes.boxplot`` could be modified to accept an optional
117+
statistical function kwarg and a dictionary of parameters to be direcly
118+
passed to it.
119+
120+
At this point in the implementation users and external libraries like
121+
seaborn would have complete control via the ``Axes.boxplot`` method. More
122+
importantly, at the very least, seaborn would require no changes to its
123+
API to allow users to take advantage of these new options.
124+
125+
Simplifications to the ``Axes.boxplot`` API and other functions
126+
---------------------------------------------------------------
127+
128+
Simplifying the boxplot method consists primarily of deprecating and then
129+
removing the redundant parameters. Optionally, a next step would include
130+
rectifying minor terminological inconsistencies between ``Axes.boxplot``
131+
and ``Axes.bxp``.
132+
133+
The parameters to be deprecated and removed include:
134+
135+
1. ``usermedians`` - processed by 10 SLOC, 3 ``if`` blocks, a ``for`` loop
136+
2. ``conf_intervals`` - handled by 15 SLOC, 6 ``if`` blocks, a ``for`` loop
137+
3. ``sym`` - processed by 12 SLOC, 4 ``if`` blocks
138+
139+
Removing the ``sym`` option allows all code in handling the remaining
140+
styling parameters to be moved to ``Axes.bxp``. This doesn't remove
141+
any complexity, but does reinforce the single responsibility principle
142+
among ``Axes.bxp``, ``cbook.boxplot_stats``, and ``Axes.boxplot``.
143+
144+
Additionally, the ``notch`` parameter could be renamed ``shownotches``
145+
to be consistent with ``Axes.bxp``. This kind of cleanup could be taken
146+
a step further and the ``whis``, ``bootstrap``, ``autorange`` could
147+
be rolled into the kwargs passed to the new ``statfxn`` parameter.
148+
149+
Backward compatibility
150+
======================
151+
152+
Implementation of this MEP would eventually result in the backwards
153+
incompatible deprecation and then removal of the keyword parameters
154+
``usermedians``, ``conf_intervals``, and ``sym``. Cursory searches on
155+
GitHub indicated that ``usermedians``, ``conf_intervals`` are used by
156+
few users, who all seem to have a very strong knowledge of matplotlib.
157+
A robust deprecation cycle should provide sufficient time for these
158+
users to migrate to a new API.
159+
160+
Deprecation of ``sym`` however, may have a much broader reach into
161+
the matplotlib userbase.
162+
163+
Schedule
164+
--------
165+
An accelerated timeline could look like the following:
166+
167+
#. v2.0.1 add transforms to ``cbook.boxplots_stats``, expose in ``Axes.boxplot``
168+
#. v2.1.0 deprecate ``usermedians``, ``conf_intervals``, ``sym`` parameters
169+
#. v2.2.0 make deprecations noisier
170+
#. v2.3.0 remove ``usermedians``, ``conf_intervals``, ``sym`` parameters
171+
#. v2.3.0 deprecate ``notch`` in favor of ``shownotches`` to be consistent with other parameters and ``Axes.bxp``
172+
#. v2.4.0 remove ``notch`` parameter, move all style and artist toggling logic to ``Axes.bxp``. ``Axes.boxplot`` is little more than a broker between ``Axes.bxp`` and ``cbook.boxplots_stats``
173+
174+
175+
Anticipated Impacts to Users
176+
----------------------------
177+
178+
As described above deprecating ``usermedians`` and ``conf_intervals``
179+
will likely impact few users. Those who will be impacted are almost
180+
certainly advanced users who will be able to adapt to the change.
181+
182+
Deprecating the ``sym`` option may import more users and effort should
183+
be taken to collect community feedback on this.
184+
185+
Anticipated Impacts to Downstream Libraries
186+
-------------------------------------------
187+
188+
The source code (GitHub master as of 2016-10-17) was inspected for
189+
seaborn and python-ggplot to see if these changes would impact their
190+
use. None of the parameters nominated for removal in this MEP are used by
191+
seaborn. The seaborn APIs that use matplotlib's boxplot function allow
192+
user's to pass arbitrary ``**kwargs`` through to matplotlib's API. Thus
193+
seaborn users with modern matplotlib installations will be able to take
194+
full advantage of any new features added as a result of this MEP.
195+
196+
Python-ggplot has implemented its own function to draw boxplots. Therefore,
197+
no impact can come to it as a result of implementing this MEP.
198+
199+
Alternatives
200+
============
201+
202+
Variations on the theme
203+
-----------------------
204+
205+
This MEP can be divided into a few loosely coupled components:
206+
207+
#. Allowing pre- and post-computation tranformation function in ``cbook.boxplot_stats``
208+
#. Exposing that transformation in the ``Axes.boxplot`` API
209+
#. Removing redundant statistical options in ``Axes.boxplot``
210+
#. Shifting all styling parameter processing from ``Axes.boxplot`` to ``Axes.bxp``.
211+
212+
213+
With this approach, #2 depends and #1, and #4 depends on #3.
214+
215+
There are two possible approaches to #2. The first and most direct would
216+
be to mirror the new ``transform_in`` and ``tranform_out`` parameters of
217+
``cbook.boxplot_stats`` in ``Axes.boxplot`` and pass them directly.
218+
219+
The second approach would be to add ``statfxn`` and ``statfxn_args``
220+
parameters to ``Axes.boxplot``. Under this implementation, the default
221+
value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
222+
pass their own function. Then ``transform_in`` and ``tranform_out`` would
223+
then be passed as elements of the ``statfxn_args`` parameter.
224+
225+
.. python:
226+
def boxplot_stats(data, ..., transform_in=None, transform_out=None):
227+
if transform_in is None:
228+
transform_in = lambda x: x
229+
230+
if transform_out is None:
231+
transform_out = lambda x: x
232+
233+
output = []
234+
for _d in data:
235+
d = transform_in(_d)
236+
stat_dict = do_stats(d)
237+
for key, value in stat_dict.item():
238+
if key != 'label':
239+
stat_dict[key] = transform_out(value)
240+
output.append(d)
241+
return output
242+
243+
244+
class Axes(...):
245+
def boxplot_option1(data, ..., transform_in=None, transform_out=None):
246+
stats = cbook.boxplot_stats(data, ...,
247+
transform_in=transform_in,
248+
transform_out=transform_out)
249+
return self.bxp(stats, ...)
250+
251+
def boxplot_option2(data, ..., statfxn=None, **statopts):
252+
if statfxn is None:
253+
statfxn = boxplot_stats
254+
stats = statfxn(data, **statopts)
255+
return self.bxp(stats, ...)
256+
257+
Both cases would allow users to do the following:
258+
259+
.. python:
260+
fig, ax1 = plt.subplots()
261+
artists1 = ax1.boxplot_optionX(data, transform_in=np.log,
262+
transform_out=np.exp)
263+
264+
265+
But Option Two lets a user write a completely custom stat function
266+
(e.g., ``my_box_stats``) with fancy BCA confidence intervals and the
267+
whiskers set differently depending on some attribute of the data.
268+
269+
This is available under the current API:
270+
271+
.. python:
272+
fig, ax1 = plt.subplots()
273+
my_stats = my_box_stats(data, bootstrap_method='BCA',
274+
whisker_method='dynamic')
275+
ax1.bxp(my_stats)
276+
277+
And would be more concise with Option Two
278+
279+
.. python:
280+
fig, ax = plt.subplots()
281+
statopts = dict(transform_in=np.log, transform_out=np.exp)
282+
ax.boxplot(data, ..., **statopts)
283+
284+
Users could also pass their own function to compute the stats:
285+
286+
.. python:
287+
fig, ax1 = plt.subplots()
288+
ax1.boxplot(data, statfxn=my_box_stats, bootstrap_method='BCA',
289+
whisker_method='dynamic')
290+
291+
From the examples above, Option Two seems to have only marginal benifit,
292+
but in the context of downstream libraries like seaborn, its advantage
293+
is more apparent as the following would be possible without any patches
294+
to seaborn:
295+
296+
.. python:
297+
import seaborn
298+
tips = seaborn.load_data('tips')
299+
g = seaborn.factorplot(x="day", y="total_bill", hue="sex", data=tips,
300+
kind='box', palette="PRGn", shownotches=True,
301+
statfxn=my_box_stats, bootstrap_method='BCA',
302+
whisker_method='dynamic')
303+
304+
This type of flexibility was the intention behind splitting the overall
305+
boxplot API in the current three functions. In practice however, downstream
306+
libraries like seaborn support versions of matplotlib dating back well
307+
before the split. Thus, adding just a bit more flexibility to the
308+
``Axes.boxplot`` could expose all the functionality to users of the
309+
downstream libraries with modern matplotlib installation without intervention
310+
from the downstream library maintainers.
311+
312+
Doing less
313+
----------
314+
315+
Another obvious alternative would be to omit the added pre- and post-
316+
computation transform functionality in ``cbook.boxplot_stats`` and
317+
``Axes.boxplot``, and simply remove the redundant statistical and style
318+
parameters as described above.
319+
320+
Doing nothing
321+
-------------
322+
323+
As with many things in life, doing nothing is an option here. This means
324+
we simply advocate for users and downstream libraries to take advantage
325+
of the split between ``cbook.boxplot_stats`` and ``Axes.bxp`` and let
326+
them decide how to provide an interface to that.

‎doc/devel/MEP/index.rst

Copy file name to clipboardExpand all lines: doc/devel/MEP/index.rst
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ Matplotlib Enhancement Proposals
2929
MEP25
3030
MEP26
3131
MEP27
32+
MEP28

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.