Add callback_execution_timeout config for deadline callbacks#66609
Draft
seanghaeli wants to merge 2 commits intoapache:mainapache/airflow:mainfrom
aws-mwaa:ghaeli/callback-timeout-configaws-mwaa/upstream-to-airflow:ghaeli/callback-timeout-configCopy head branch name to clipboard
Draft
Add callback_execution_timeout config for deadline callbacks#66609seanghaeli wants to merge 2 commits intoapache:mainapache/airflow:mainfrom aws-mwaa:ghaeli/callback-timeout-configaws-mwaa/upstream-to-airflow:ghaeli/callback-timeout-configCopy head branch name to clipboard
seanghaeli wants to merge 2 commits intoapache:mainapache/airflow:mainfrom
aws-mwaa:ghaeli/callback-timeout-configaws-mwaa/upstream-to-airflow:ghaeli/callback-timeout-configCopy head branch name to clipboard
Conversation
Deadline callbacks currently have no timeout -- if a callback hangs, it blocks the executor/triggerer indefinitely. This adds a single [deadlines] callback_execution_timeout configuration option (default 300s) that enforces a maximum execution time for callback subprocesses. When the timeout is exceeded, the supervisor kills the subprocess with SIGTERM (escalating to SIGKILL if necessary). Setting the value to 0 disables the timeout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
[deadlines] callback_execution_timeoutconfiguration option that sets a maximum execution time for deadline callbacks. If a callback exceeds this timeout, the supervisor kills it with SIGTERM (escalating to SIGKILL).Default: 300 seconds (5 minutes). Set to 0 to disable.
Discussion — seeking community input
This PR is intentionally opened as a draft to invite discussion on the design:
Single global config vs per-callback? Currently a single
[deadlines] callback_execution_timeoutapplies to all callbacks. Should users be able to override per-callback (e.g., via a kwarg on the deadline definition)?Default value? 300s (5 min) seems reasonable for alert callbacks (send a Slack message, trigger a PagerDuty alert). Too short for heavy callbacks? Too long for simple ones?
Sync + async? Should this apply to both executor (sync) and triggerer (async) callback paths, or just one?
Config section? Using
[deadlines]since this is specific to deadline callbacks. Alternative:[callbacks]if we want it to apply to all future callback types (dag/task callbacks when migrated).Changes
config.yml: New[deadlines]section withcallback_execution_timeout(integer, default 300, version_added 3.3.0)callback_supervisor.py: Timeout enforcement in_monitor_subprocess()usingtime.monotonic(), kills via existingkill()methodRelated
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.6) following the guidelines