Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Timer consistency across backends #29062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
Loading
from

Conversation

greglucas
Copy link
Contributor

@greglucas greglucas commented Nov 1, 2024

PR summary

Currently, all of the backend timers have subtly different behavior relating to long-running callbacks. Meaning we don't make any guarantees about when the timers will fire currently and it is the wild-west of what will happen. This PR has quite a few things going on to try and bring these all into consistency and adds tests for these cases. I'll describe the main updates below:

default backend main loop
This is a really bad error IMO. We are currently running the loop for x number of sleeps(), but if the callback takes longer than the sleep duration this makes the runloop run longer than a user requested. I have updated this to be dependent on total duration within the loop.

interval updates
When setting the interval or singleshot attribute of a timer, we would always call the underlying start() again. Meaning that this would trigger a timer "reset" in a sense, even if the interval was exactly the same as before (i.e. setting it constantly in a loop). I have updated this to only "reset"/"restart" the timer if the value has changed.

Tk
We need to keep track of the expected firing time ourselves since there is no repeating timer. We keep track of the original requested firing time and the callback trigger to account for the case of a callback taking longer than the repeat timer when we actually want the timer to fire immediately and not on the next interval.

Added singleshot and interval handler updates.

wx

Added singleshot update handler.

macos

Added an asynchronous dispatch to the main queue. This prevents timer drift with synchronous slow callbacks.

Added singleshot and interval update handling.

Removed previously started repeating timer

testing

Added a test for a slow callback to the interactive timers and refactored the tests to try to process in parallel and reduce some of the time spent in the tests.

closes #28647
closes #29029
closes #29076

PR checklist

@greglucas
Copy link
Contributor Author

With this PR all backends produce 7 events for this script:
#29029 (comment)

@greglucas
Copy link
Contributor Author

I added tests and they all pass locally for me, but I can't seem to get things working on the CI runners, so I'm going to put some thoughts/notes down here. I'm wondering if there are issues with trying to assert against timing on shared resources? Thinking it was just trying to run things too fast on limited resources, I increased the timer interval to 500ms and that didn't seem to help. Currently, I am asserting against the expected number of calls, is there possibly a delay involved in first timer spin-up so I should be asserting against the final timer interval somehow instead? If anyone has ideas for better tests here let me know.

@QuLogic
Copy link
Member

QuLogic commented Nov 6, 2024

Since this is only failing on 3.13 (on macOS at least), one possibly suspicious change is in time.perf_counter:

Changed in version 3.13: Use the same clock as time.monotonic().

But I thought you had used time.monotonic before? Did it fail then as well?

@QuLogic
Copy link
Member

QuLogic commented Nov 6, 2024

I can reproduce the failure on AppVeyor locally (Expected 8, got 5); given #28647 (comment), I think that means it is re-evaluating the timeout after the slow callback and not attempting to create a consistent time.

@greglucas
Copy link
Contributor Author

That is actually a different failure than the one we are seeing here of course 😂 https://github.com/matplotlib/matplotlib/actions/runs/11688137387/job/32547874573?pr=29062#step:15:200

AssertionError: Event loop: Expected to run for around 2s, but ran for 4.47s

These tests on CI systems seem flakey because I had a job right before this push where all the tests passed. I think this gets to the comment you left about system resources and scheduling because sometimes QT is the one failing in CI, others it is GTK. We are running pytest with as many processors as possible -n auto, then submitting subprocesses that spin up GUI event loops from within those processes on potentially shared resources in a CI environment. (One thought I had was whether we could only run one subprocess per backend rather than starting/stopping so many of them, something like MPLBACKEND=gtk4agg pytest -m all_gui_tests which would keep the same backend for all the interactive test session, but that seems like quite a bit of a rework)

Running locally with this script:

import time
import matplotlib.pyplot as plt

fig = plt.figure()

last_time = orig_time = time.perf_counter()

x = 0

def on_timer():
    global x
    global last_time
    t = time.perf_counter()
    print(f"{x:02d}: {t - last_time:.3f} ({t - orig_time:.3f})")
    last_time = t
    if x == 0:
        time.sleep(0.5)
    else:
        time.sleep(0.1)
    x += 1

timer = fig.canvas.new_timer(interval=150)
timer.add_callback(on_timer)
timer.start()
fig.canvas.start_event_loop(3)

GTK has an ~8ms drift associated with it, whereas all the other backends hit within ~1ms.

00: 0.151 (0.151)
01: 0.502 (0.653)
02: 0.157 (0.811)
03: 0.159 (0.969)
04: 0.157 (1.126)
05: 0.158 (1.284)
06: 0.158 (1.442)
07: 0.158 (1.600)
08: 0.159 (1.759)
09: 0.157 (1.916)
10: 0.157 (2.073)
11: 0.158 (2.231)
12: 0.158 (2.389)
13: 0.158 (2.547)
14: 0.158 (2.704)
15: 0.157 (2.861)
16: 0.156 (3.018)
17: 0.157 (3.175)
18: 0.158 (3.333)
19: 0.157 (3.490)
20: 0.158 (3.648)
21: 0.158 (3.806)
22: 0.155 (3.961)

@greglucas
Copy link
Contributor Author

It looks like GTK has no plans on changing this: https://gitlab.gnome.org/GNOME/glib/-/issues/503
I tried moving the logic from Tk over to GTK as well and recreating a singleshot timer with our own calculated interval and got values above and below 150ms rather than consistently 158ms. So theoretically might reduce some of the accumulated error, but it also still has a pretty significant error compared to all of the other frameworks which are closer to 1ms.

00: 0.153 (0.153)
01: 0.514 (0.667)
02: 0.234 (0.901)
03: 0.156 (1.057)
04: 0.144 (1.201)
05: 0.157 (1.358)
06: 0.146 (1.504)
07: 0.157 (1.661)
08: 0.146 (1.807)
09: 0.146 (1.953)
10: 0.158 (2.111)
11: 0.147 (2.258)
12: 0.147 (2.405)
13: 0.147 (2.551)
14: 0.156 (2.707)
15: 0.146 (2.853)

The implementation of start_event_loop would previously just count
the number of sleeps that occurred. But this could lead to longer
event loop times if flush_events() added time into the loop. We
want the condition to be dependent on the end-time so we don't
run our loop longer than necessary.
If we set interval in a loop, this could cause the timing to become
dependent on the callback processing time.
The Tk timer would reset itself after the callback had processed
to add a new interval. This meant that if a long callback was being
added we would get a drift in the timer. We need to manually track
the original firing time and intervals based on that.
Make sure that the interactive timers don't drift when long
callbacks are associated with them.
Previously, the timers were dependent on the length of time it took for
the timer callback to execute. This dispatches the callback to
the task queue to avoid synchronously waiting on long-running callback
tasks.
This adds more robust interactive timer tests to assert against some
of the discrepencies that were found in testing.

- Run loop shouldn't depend on callback time
- Slow callbacks shouldn't cause a timer to drift over time, it should
  continually fire at the requested cadence
- When start() is called again it should invalidate the previous
  timer associated with that Timer object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.