Description
TLDR - I think we need to cap our benchmarks at a maximum of .2 seconds. That's a long way off though, so I think should start with a cap of 1 second per benchmark
Right now we have some very long running benchmarks:
https://pandas.pydata.org/speed/pandas/#summarylist?sort=1&dir=desc
I haven't seen a definitive answer, but I think ASV leverages the builtin timeit functionality to figure out how long a given benchmark should run.
https://docs.python.org/3.7/library/timeit.html#command-line-interface
Quoting what I thinks is important:
If -n is not given, a suitable number of loops is calculated by trying successive powers of 10 until the total time is at least 0.2 seconds.
So IIUC a particular statement is executed n times (where n is a power of 10) to the point where it reaches 0.2 seconds to run, and then is repeated repeat
times to get a reading. asv continuous would do this 4 times (2 runs for each commit being compared). In Python 3.6 repeat
is 3 (we currently pin ASVs to 3.6) but in future versions that gets bumped to 5.
We have a handful of benchmarks that are 20s a piece to run, so if we stick to the 3.6 timing these statements would run n=1 times repeated 3 times per benchmark session * 4 sessions per continuous run. 20s * 3 repeats * 4 sessions = 4 minutes for one benchmark alone
rolling.Apply.time_rolling is a serious offender here so I think can start with that. Would take community PRs to improve performance of any of these, though maybe should prioritize anything currently taking over 1 second