Description
Bug summary
There are apparently 3 problems which combine to make savefig slow: (1) The use of many sub-plots, (2) the use of bbox_inches='tight'
, and (3) the use of sharex='columns'
. Unfortunately I need to use all three.
Code for reproduction
%matplotlib inline
from io import BytesIO
import numpy as np
from matplotlib.figure import Figure
# Random Number Generator.
rng = np.random.default_rng()
# Constants.
figsize = (10, 6)
ncols = 3
nrows = 10
size = 100
size_total = ncols * nrows * size
# Figure with many subplots.
fig_many = Figure(figsize=figsize)
axs_many = fig_many.subplots(ncols=ncols, nrows=nrows)
# Figure with many subplots and sharex='col'.
fig_many_sharex = Figure(figsize=figsize)
axs_many_sharex = fig_many_sharex.subplots(ncols=ncols, nrows=nrows, sharex='col')
# Figure with a single axes.
fig_single = Figure(figsize=figsize)
ax_single = fig_single.subplots()
# Helper-function: Generate random line-plots in the many subplots.
def generate_fig_many(axs):
for row in range(nrows):
for col in range(ncols):
ax = axs[row, col]
x = rng.normal(loc=row+1, scale=col+1, size=size)
y = rng.normal(loc=col+1, scale=row+1, size=size)
x = np.sort(x)
ax.plot(x, y);
ax.set_yticks([])
# Generate fig_many
generate_fig_many(axs=axs_many)
fig_many.tight_layout()
# Generate fig_many_sharex
generate_fig_many(axs=axs_many_sharex)
fig_many_sharex.tight_layout()
# Generate fig_single
x = rng.normal(size=size_total)
y = rng.normal(size=size_total)
x = np.sort(x)
ax_single.plot(x, y);
fig_single.tight_layout()
# The following code-chunks were run in individual Jupyter cells.
%%timeit
stream = BytesIO()
fig_single.savefig(stream, format='svg')
s = stream.getvalue()
# 29.2 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
stream = BytesIO()
fig_single.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()
# 102 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
stream = BytesIO()
fig_many.savefig(stream, format='svg')
s = stream.getvalue()
# 374 ms ± 4.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
stream = BytesIO()
fig_many.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()
# 1.4 s ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='svg')
s = stream.getvalue()
# 565 ms ± 5.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()
# 2.22 s ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='jpg', bbox_inches='tight')
s = stream.getvalue()
# 2.17 s ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='png', bbox_inches='tight')
s = stream.getvalue()
# 2.19 s ± 31.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Actual outcome
The test-results are summarized in this table, which are all for the SVG format. A few tests are made above for JPG and PNG formats and the results are similar.
Figure | no bbox | bbox=tight | layout=constrained | layout=tight |
---|---|---|---|---|
fig_single |
29 | 102 | 30 | 30 |
fig_many |
374 | 1,400 | 1,410 | 1,340 |
fig_many_sharex |
565 | 2,220 | 2,220 | 2,110 |
Edit: Added time-usage for setting either layout='constrained'
or 'tight'
when creating the Figure
objects.
Expected outcome
I would like it to run like this (you asked for a visual example).
Additional information
Thanks for making Matplotlib, I've been using it for many open-source projects in the past!
I am currently building a web-app where Matplotlib will be generating many SVG plots on a server that is running in the cloud. My own functions for generating the data are very fast, but unfortunately the plotting itself is very slow. For example, a figure with 3 columns and 10 rows of sub-plots takes 7 seconds to run savefig
- even though most of the sub-plots only have a simple text-string such as "Same as previous", and the few other sub-plots are either line-plots or fill_between
that are generated from just 100 data-points each.
I have tried simulating this problem in the sample code above, where fig_many
has many sub-plots, and fig_single
has a single plot with the same total number of data-points. I also tried using a profiler on this code, but it would take me forever to try and understand what the problem is in Matplotlib's code, and whether it's even fixable.
Please tell me if it might be possible to improve the speed, or if it's not possible then please explain the technical reason, and whether there is a work-around.
Thanks!
Operating system
Kubuntu 22
Matplotlib Version
3.7.1
Matplotlib Backend
module://matplotlib_inline.backend_inline
Python version
3.9.12
Jupyter version
6.4.12 (through VSCode)
Installation
pip