Description
Bug summary
Creating sub-plots in Matplotlib is typically 4-12x slower than Plotly. This is not a bug per se, but a serious performance issue for time-critical applications such as interactive web-apps. There are several closed GitHub issues about the slowness of creating sub-plots that go back 6-7 years, but it's still a problem.
Code for reproduction
%matplotlib inline
from matplotlib.figure import Figure
from plotly.subplots import make_subplots
import numpy as np
import pandas as pd
import timeit
rng = np.random.default_rng()
results = []
for i in range(30):
# Random number of rows and columns.
rows, cols = rng.integers(low=1, high=20, size=2)
# Plotly won't accept Numpy ints.
rows = int(rows)
cols = int(cols)
# Total number of sub-plots.
total = rows*cols
# Timer.
t1 = timeit.default_timer()
# Matplotlib.
fig_matplotlib = Figure()
axs_matplotlib = fig_matplotlib.subplots(nrows=rows, ncols=cols)
# Timer.
t2 = timeit.default_timer()
# Plotly.
fig_plotly = make_subplots(rows=rows, cols=cols)
# Timer.
t3 = timeit.default_timer()
# Time-usage.
t_matplotlib = t2 - t1
t_plotly = t3 - t2
# Relative time-usage.
t_relative = t_matplotlib / t_plotly
# Save results.
results.append(dict(rows=rows, cols=cols, total=total,
t_matplotlib=t_matplotlib, t_plotly=t_plotly,
t_relative=t_relative))
# Show status.
print(f'{rows}\t{cols}\t{total}\t{t_relative:.3f}')
# Convert results to Pandas DataFrame.
df_results = pd.DataFrame(results)
# Plot relative time-usage.
df_results.plot(kind='scatter', x='total', y='t_relative', grid=True);
# Plot individual time-usage.
df2 = df_results.set_index('total').sort_index()
df2.plot(y=['t_matplotlib', 't_plotly'], grid=True, ylabel='seconds');
Actual outcome
In my actual application with 3 columns and 10 rows, the time-usage for Matplotlib is consistently around 1.8 seconds, but for some reason it is only around 0.5 seconds in these tests.
This plot shows the individual time-usage for Matplotlib and Plotly, where the x-axis is the total number of sub-plots (cols * rows):
Note the jagged lines for the Matplotlib time-usage. We could average several runs to make the lines smoother, but the trend is clear, and the jaggedness is actually quite strange, that the time changes so much from run to run.
This plot shows the relative time-usage (Matplotlib time / Plotly time):
Expected outcome
I would like it to run like this - minus the crashes, please.
Additional information
Thanks again for making Matplotlib! I don't want to sound ingrateful or too demanding, as this is my second GitHub issue in a few days relating to the performance of using many sub-plots in Matplotlib. But these issues are major bottle-necks in my application that take around 90% of the runtime. I also wonder if perhaps the issues are related. (See #26150)
Is there a technical reason that Plotly is so much faster than Matplotlib when it comes to having sub-plots?
I imagine that Matplotlib has been made by many different people over a long period of time, so perhaps it is getting hard to understand what the code is doing sometimes?
Plotly runs very fast and is easy to use, but I have already made everything in Matplotlib, and I'm not even sure Plotly has all the features I need to customize the plots. So I'm hoping it would be possible to improve the speed of Matplotlib when using sub-plots.
Thanks!
Operating system
Kubuntu 22
Matplotlib Version
3.7.1
Matplotlib Backend
module://matplotlib_inline.backend_inline
Python version
3.9.12
Jupyter version
6.4.12 (through VSCode)
Installation
pip