Timer consistency across backends #29062

greglucas · Nov 1, 2024

PR summary

Currently, all of the backend timers have subtly different behavior relating to long-running callbacks. Meaning we don't make any guarantees about when the timers will fire currently and it is the wild-west of what will happen. This PR has quite a few things going on to try and bring these all into consistency and adds tests for these cases. I'll describe the main updates below:

default backend main loop
This is a really bad error IMO. We are currently running the loop for x number of sleeps(), but if the callback takes longer than the sleep duration this makes the runloop run longer than a user requested. I have updated this to be dependent on total duration within the loop.

interval updates
When setting the interval or singleshot attribute of a timer, we would always call the underlying start() again. Meaning that this would trigger a timer "reset" in a sense, even if the interval was exactly the same as before (i.e. setting it constantly in a loop). I have updated this to only "reset"/"restart" the timer if the value has changed.

Tk
We need to keep track of the expected firing time ourselves since there is no repeating timer. We keep track of the original requested firing time and the callback trigger to account for the case of a callback taking longer than the repeat timer when we actually want the timer to fire immediately and not on the next interval.

Added singleshot and interval handler updates.

wx

Added singleshot update handler.

macos

Added an asynchronous dispatch to the main queue. This prevents timer drift with synchronous slow callbacks.

Added singleshot and interval update handling.

Removed previously started repeating timer

testing

Added a test for a slow callback to the interactive timers and refactored the tests to try to process in parallel and reduce some of the time spent in the tests.

closes #28647
closes #29029
closes #29076

PR checklist

"closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[n/a] Plotting related features are demonstrated in an example
[n/a] New Features and API Changes are noted with a directive and release note
[n/a] Documentation complies with general and docstring guidelines

greglucas · Nov 1, 2024

With this PR all backends produce 7 events for this script:
#29029 (comment)

greglucas · Nov 4, 2024

I added tests and they all pass locally for me, but I can't seem to get things working on the CI runners, so I'm going to put some thoughts/notes down here. I'm wondering if there are issues with trying to assert against timing on shared resources? Thinking it was just trying to run things too fast on limited resources, I increased the timer interval to 500ms and that didn't seem to help. Currently, I am asserting against the expected number of calls, is there possibly a delay involved in first timer spin-up so I should be asserting against the final timer interval somehow instead? If anyone has ideas for better tests here let me know.

QuLogic · Nov 6, 2024

Since this is only failing on 3.13 (on macOS at least), one possibly suspicious change is in time.perf_counter:

Changed in version 3.13: Use the same clock as time.monotonic().

But I thought you had used time.monotonic before? Did it fail then as well?

QuLogic · Nov 6, 2024

I can reproduce the failure on AppVeyor locally (Expected 8, got 5); given #28647 (comment), I think that means it is re-evaluating the timeout after the slow callback and not attempting to create a consistent time.

greglucas · Nov 6, 2024

That is actually a different failure than the one we are seeing here of course 😂 https://github.com/matplotlib/matplotlib/actions/runs/11688137387/job/32547874573?pr=29062#step:15:200

AssertionError: Event loop: Expected to run for around 2s, but ran for 4.47s

These tests on CI systems seem flakey because I had a job right before this push where all the tests passed. I think this gets to the comment you left about system resources and scheduling because sometimes QT is the one failing in CI, others it is GTK. We are running pytest with as many processors as possible -n auto, then submitting subprocesses that spin up GUI event loops from within those processes on potentially shared resources in a CI environment. (One thought I had was whether we could only run one subprocess per backend rather than starting/stopping so many of them, something like MPLBACKEND=gtk4agg pytest -m all_gui_tests which would keep the same backend for all the interactive test session, but that seems like quite a bit of a rework)

Running locally with this script:

import time
import matplotlib.pyplot as plt

fig = plt.figure()

last_time = orig_time = time.perf_counter()

x = 0

def on_timer():
    global x
    global last_time
    t = time.perf_counter()
    print(f"{x:02d}: {t - last_time:.3f} ({t - orig_time:.3f})")
    last_time = t
    if x == 0:
        time.sleep(0.5)
    else:
        time.sleep(0.1)
    x += 1

timer = fig.canvas.new_timer(interval=150)
timer.add_callback(on_timer)
timer.start()
fig.canvas.start_event_loop(3)

GTK has an ~8ms drift associated with it, whereas all the other backends hit within ~1ms.

00: 0.151 (0.151)
01: 0.502 (0.653)
02: 0.157 (0.811)
03: 0.159 (0.969)
04: 0.157 (1.126)
05: 0.158 (1.284)
06: 0.158 (1.442)
07: 0.158 (1.600)
08: 0.159 (1.759)
09: 0.157 (1.916)
10: 0.157 (2.073)
11: 0.158 (2.231)
12: 0.158 (2.389)
13: 0.158 (2.547)
14: 0.158 (2.704)
15: 0.157 (2.861)
16: 0.156 (3.018)
17: 0.157 (3.175)
18: 0.158 (3.333)
19: 0.157 (3.490)
20: 0.158 (3.648)
21: 0.158 (3.806)
22: 0.155 (3.961)

greglucas · Nov 7, 2024

It looks like GTK has no plans on changing this: https://gitlab.gnome.org/GNOME/glib/-/issues/503
I tried moving the logic from Tk over to GTK as well and recreating a singleshot timer with our own calculated interval and got values above and below 150ms rather than consistently 158ms. So theoretically might reduce some of the accumulated error, but it also still has a pretty significant error compared to all of the other frameworks which are closer to 1ms.

00: 0.153 (0.153)
01: 0.514 (0.667)
02: 0.234 (0.901)
03: 0.156 (1.057)
04: 0.144 (1.201)
05: 0.157 (1.358)
06: 0.146 (1.504)
07: 0.157 (1.661)
08: 0.146 (1.807)
09: 0.146 (1.953)
10: 0.158 (2.111)
11: 0.147 (2.258)
12: 0.147 (2.405)
13: 0.147 (2.551)
14: 0.156 (2.707)
15: 0.146 (2.853)

The implementation of start_event_loop would previously just count the number of sleeps that occurred. But this could lead to longer event loop times if flush_events() added time into the loop. We want the condition to be dependent on the end-time so we don't run our loop longer than necessary.

If we set interval in a loop, this could cause the timing to become dependent on the callback processing time.

The Tk timer would reset itself after the callback had processed to add a new interval. This meant that if a long callback was being added we would get a drift in the timer. We need to manually track the original firing time and intervals based on that.

Make sure that the interactive timers don't drift when long callbacks are associated with them.

Previously, the timers were dependent on the length of time it took for the timer callback to execute. This dispatches the callback to the task queue to avoid synchronously waiting on long-running callback tasks.

This adds more robust interactive timer tests to assert against some of the discrepencies that were found in testing. - Run loop shouldn't depend on callback time - Slow callbacks shouldn't cause a timer to drift over time, it should continually fire at the requested cadence - When start() is called again it should invalidate the previous timer associated with that Timer object

greglucas · May 26, 2025

Rebased and pushed again. All tests passing locally, going to see how this does on CI now...

github-actions bot added GUI: wx GUI: MacOSX GUI: tk topic: canvas and figure manager labels Nov 1, 2024

This was referenced Nov 1, 2024

FIX/ENH: macos: dispatch timer tasks asynchronously to the main loop #29030

Closed

Avoid timer drift during long callbacks #29023

Closed

FIX: macos: Add update capability to interval/singleshot timer properties #29018

Closed

greglucas force-pushed the timer-consistency branch from 8596fb6 to 7f40ba5 Compare November 5, 2024 16:11

greglucas force-pushed the timer-consistency branch from 7f40ba5 to 8bc2e95 Compare November 20, 2024 16:59

github-actions bot added the status: needs rebase label Nov 20, 2024

greglucas force-pushed the timer-consistency branch from 8bc2e95 to 54e07ed Compare November 20, 2024 17:46

github-actions bot removed the status: needs rebase label Nov 20, 2024

greglucas added 12 commits May 26, 2025 14:44

FIX: Add update capability to interval/singleshot timer properties

84e7e60

MNT/TST: Refactor test-interactive-timers to reduce test time

c04db6a

FIX: Add single shot update capability to TimerWx

1ef70ca

FIX: Only call timer updates when things change on the timer

88f8b77

If we set interval in a loop, this could cause the timing to become dependent on the callback processing time.

TST: Add test for timer drift in interactive backends

459410f

Make sure that the interactive timers don't drift when long callbacks are associated with them.

TST: Add backend bases test

ebe66bc

FIX/ENH: macos: dispatch timer tasks asynchronously to the main loop

a5cd96d

Previously, the timers were dependent on the length of time it took for the timer callback to execute. This dispatches the callback to the task queue to avoid synchronously waiting on long-running callback tasks.

TST: Add text to identify failing tests

d1e52b0

FIX: macos should invalidate the previous timer when creating a new one

577a0cf

TST: Update some times for interactive timer test on CI

1affa9e

greglucas force-pushed the timer-consistency branch from 54e07ed to 1affa9e Compare May 26, 2025 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Timer consistency across backends #29062

Timer consistency across backends #29062

greglucas commented Nov 1, 2024 •

edited

Loading

Uh oh!

greglucas commented Nov 1, 2024

Uh oh!

greglucas commented Nov 4, 2024

Uh oh!

QuLogic commented Nov 6, 2024

Uh oh!

QuLogic commented Nov 6, 2024

Uh oh!

greglucas commented Nov 6, 2024

Uh oh!

greglucas commented Nov 7, 2024

Uh oh!

greglucas commented May 26, 2025

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

Timer consistency across backends #29062

Are you sure you want to change the base?

Timer consistency across backends #29062

Conversation

greglucas commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

greglucas commented Nov 1, 2024

Uh oh!

greglucas commented Nov 4, 2024

Uh oh!

QuLogic commented Nov 6, 2024

Uh oh!

QuLogic commented Nov 6, 2024

Uh oh!

greglucas commented Nov 6, 2024

Uh oh!

greglucas commented Nov 7, 2024

Uh oh!

greglucas commented May 26, 2025

Uh oh!

Uh oh!

greglucas commented Nov 1, 2024 •

edited

Loading