UN-3657 [FIX] Execution serializer — terminal-failure run reports unaccounted files as failed, not in-progress#2126
UN-3657 [FIX] Execution serializer — terminal-failure run reports unaccounted files as failed, not in-progress#2126muhammad-ali-e merged 2 commits intofeat/UN-3445-pg-queue-integrationZipstack/unstract:feat/UN-3445-pg-queue-integrationfrom UN-3657-execution-terminal-file-countsZipstack/unstract:UN-3657-execution-terminal-file-countsCopy head branch name to clipboard
Conversation
…ccounted files as failed, not in-progress A terminal ERROR/STOPPED execution could show files as still 'in progress' in the UI (observed on exec 4b180f05: ERROR, total_files=2, shown as '2 in progress'). ExecutionSerializer derives successful/failed by COUNTING workflow_file_execution rows; when orchestration fails before those rows are created (0 rows) both read 0, and the UI derives in-progress as total - successful - failed = total -> phantom in-progress for a finished run. get_failed_files now returns total - successful for a terminal FAILURE run (ExecutionStatus.is_failure -> ERROR/STOPPED): every file that didn't succeed in a finished failure run is a failure, including ones that never got a row. Scoped to is_failure, so COMPLETED runs and live (PENDING/EXECUTING) runs keep the exact row-count behaviour -> no-op when rows already account for every file, success path + real-time progress untouched on every transport. max(0, ...) guards a stale successful>total count. OSS read-path only; transport-agnostic (also fixes the latent Celery case). Fix #1 (UN-3654) already PREVENTS the failure that produced this display; this is defense-in-depth for any early-orchestration failure. Tests: serializer unit tests (ORM mocked, no DB) — error/stopped reconcile, completed + executing unchanged, negative-clamp guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| backend/workflow_manager/execution/serializer/execution.py | Core fix: adds per-(id,status) memoized count helper and reconciles failed_files for terminal failure runs; logic is correct and edge cases (None total_files, drift, unknown status) are handled. |
| backend/workflow_manager/execution/tests/test_execution_serializer.py | New test file covering all six reconciliation branches (zero rows, partial rows, STOPPED, COMPLETED, EXECUTING, drift guard) with ORM-free mocks, mirroring existing repo test patterns. |
| backend/workflow_manager/execution/tests/init.py | Empty init file to make the new tests directory a Python package. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["get_failed_files(obj)"] --> B["error_rows = _status_count(obj, ERROR)"]
B --> C{"ExecutionStatus.is_failure(obj.status)?"}
C -- "No (COMPLETED / EXECUTING / PENDING / unknown)" --> D["return error_rows\n(unchanged behaviour)"]
C -- "Yes (ERROR / STOPPED)" --> E["successful = _status_count(obj, COMPLETED)\ntotal = obj.total_files or 0"]
E --> F{"successful + error_rows > total?"}
F -- "Yes (drift / impossible count)" --> G["logger.warning(...)\nreturn error_rows"]
F -- "No" --> H["return total - successful\n(absorbs unaccounted/PENDING/EXECUTING rows)"]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["get_failed_files(obj)"] --> B["error_rows = _status_count(obj, ERROR)"]
B --> C{"ExecutionStatus.is_failure(obj.status)?"}
C -- "No (COMPLETED / EXECUTING / PENDING / unknown)" --> D["return error_rows\n(unchanged behaviour)"]
C -- "Yes (ERROR / STOPPED)" --> E["successful = _status_count(obj, COMPLETED)\ntotal = obj.total_files or 0"]
E --> F{"successful + error_rows > total?"}
F -- "Yes (drift / impossible count)" --> G["logger.warning(...)\nreturn error_rows"]
F -- "No" --> H["return total - successful\n(absorbs unaccounted/PENDING/EXECUTING rows)"]
Reviews (2): Last reviewed commit: "UN-3657 [FIX] address PR #2126 review — ..." | Re-trigger Greptile
muhammad-ali-e
left a comment
There was a problem hiding this comment.
Automated PR review (PR Review Toolkit)
Multi-agent review of the get_failed_files reconciliation change. The production fix is logically correct and well-scoped for the happy path it targets — finished ERROR/STOPPED runs no longer show phantom in-progress files. Findings below are about robustness of the new derivation, observability of the defensive guards, the test harness, and docstring precision. Inline comments follow; a prioritized summary is in the PR thread.
Note: the existing greptile P2 on line 50 (double COUNT query) is a separate perf point and is not repeated here.
| whenever the rows already account for every file, so the success path | ||
| and real-time progress are untouched on every transport. | ||
| """ | ||
| if ExecutionStatus.is_failure(obj.status): |
There was a problem hiding this comment.
P1 — failure branch trusts total_files and discards the actual ERROR-row count, so a stale/low total_files silently under-reports failures (the inverse of the bug being fixed).
The new branch returns total_files - successful and never looks at real ERROR rows. If total_files lags the true row count (counter drift — exactly what UN-3652/UN-3657 are about — or a partial-write race during abort), genuine failures vanish and reappear as phantom in-progress via the UI's total - successful - failed.
Example: total_files=2 but 5 rows exist (3 COMPLETED, 2 ERROR) → max(0, 2-3)=0 failed; the old code returned 2. Two real failures disappear, silently.
Secondary: ExecutionStatus.is_failure swallows ValueError and returns False for any unrecognized status (data_models.py:576-581), so a malformed/legacy status string silently takes the else row-count path — reintroducing phantom-in-progress for exactly the corrupt data most worth flagging.
Suggested hardening — take the larger of the derived value and the real ERROR count, and don't let the divergence go unrecorded:
error_rows = obj.file_executions.filter(status=ExecutionStatus.ERROR.value).count()
derived = total - self.get_successful_files(obj)
if derived != error_rows:
logger.warning("exec %s failed-file reconcile mismatch: total=%s successful=%s error_rows=%s", obj.id, total, ..., error_rows)
return max(0, derived, error_rows)(There is no logger in this module today — its siblings views/execution.py and workflow_v2/serializers.py both define one.)
Broader fix worth considering: the partition invariant successful + failed + in_progress == total is currently split across this serializer (two terms) and the frontend (DetailedLogs.jsx derives the third). Emitting an explicit in_progress_files = SerializerMethodField (= max(0, total - successful - failed)) and having the UI render it would let the server own all three buckets, making this whole class of phantom-count bug impossible regardless of how failed is computed.
| """ | ||
| if ExecutionStatus.is_failure(obj.status): | ||
| total = obj.total_files or 0 | ||
| return max(0, total - self.get_successful_files(obj)) |
There was a problem hiding this comment.
P2 — max(0, …) silently clamps the impossible successful > total state with no signal. successful > total means more files completed than the run claims to have had — an impossible count that indicates real counter corruption (double-counted completions, total_files written too low at dispatch, concurrent updates). The clamp returns 0 and erases the most diagnostic inconsistency there is. test_successful_over_total_is_clamped_not_negative even enshrines this silent 0 as correct.
Keep the clamp as the safe return value, but logger.warning(...) the anomaly first (execution id + both counts) so the drift is debuggable months later instead of invisible. This is consistent with the project's error-surfacing conventions.
| with 0 rows reads as "N in progress" instead of "N failed"). | ||
|
|
||
| Scoped to ``is_failure`` (ERROR/STOPPED) so COMPLETED runs and live | ||
| runs (PENDING/EXECUTING) keep the exact row-count behaviour — a no-op |
There was a problem hiding this comment.
P3 — docstring precision / verbosity.
- "a no-op whenever the rows already account for every file" is imprecise: the new branch (
total - COMPLETED) equals the old (ERROR count) only whencompleted + errored == total. A file with a row that is still PENDING/EXECUTING does account for the file in the plain reading, yet the new code now counts it as failed — so behavior does change there. Suggest: "a no-op whenever every file already has a terminal COMPLETED/ERROR row." - "...untouched on every transport" — this method reads only
obj.status/total_files/row counts and has no transport (Celery vs PG) dependency; mentioning transport implies a conditional that doesn't exist. Drop it. - 14 lines of prose for a 3-line method, with the rationale repeated again in the test-module docstring. Consider trimming to ~6 lines, keeping the load-bearing "why" (UI derives in-progress as
total - successful - failed). Ideally anchor that claim to the specific frontend component so it doesn't rot silently.
| import django | ||
| from django.apps import apps | ||
|
|
||
| os.environ.setdefault("DJANGO_SETTINGS_MODULE", "backend.settings.test") |
There was a problem hiding this comment.
P2 — os.environ.setdefault("DJANGO_SETTINGS_MODULE", …) is a no-op when the var is already exported, which is extremely common in dev shells (backend.settings.dev, or an enterprise backend.settings.cloud). A developer with DJANGO_SETTINGS_MODULE already set gets their module instead of backend.settings.test, breaking collection (reproduced: ModuleNotFoundError: No module named 'backend.settings.cloud' on a checkout with that env). Force the assignment: os.environ["DJANGO_SETTINGS_MODULE"] = "backend.settings.test".
|
|
||
| os.environ.setdefault("DJANGO_SETTINGS_MODULE", "backend.settings.test") | ||
| if not apps.ready: | ||
| django.setup() |
There was a problem hiding this comment.
P2 — django.setup() contradicts the module docstring's "no test database is required" (line 10) and diverges from this repo's DB-free test convention. django.setup() imports the full backend.settings.base stack and runs every app's AppConfig.ready() — which (a) requires a complete prod-like env (from a clean checkout with only sample.test.env it aborts on missing CELERY_BROKER_BASE_URL/INDEXING_FLAG_TTL), and (b) actually opens a live DB connection during startup (execution_log_utils.ready()). So the bootstrap is neither env-free nor DB-free as claimed, even though the serializer methods under test are pure.
Sibling DB-free tests avoid this by stubbing the models in sys.modules before import (usage_v2/tests/test_helper.py, pg_queue/tests/test_producer.py). Recommend following that pattern — stub workflow_manager.workflow_v2.models (+ enums), drop django.setup(), and the docstring becomes true. (Relevant since there is no backend pytest job in CI, so the clean-environment fragility is the primary failure mode.)
| obj = _execution(ExecutionStatus.STOPPED.value, 5, completed=3) | ||
| assert self.serializer.get_failed_files(obj) == 2 | ||
|
|
||
| def test_completed_run_uses_row_count_unchanged(self): |
There was a problem hiding this comment.
P3 — test gaps that would lock the intended contract:
- COMPLETED-not-reconciled asymmetry is untested. This case only covers
completed + errored == total. The deliberate decision not to reconcile COMPLETED is invisible: add_execution(COMPLETED, total=3, completed=1, errored=1)assertingfailed == 1(the ERROR-row count), nottotal - completed == 2, so a future "fix" that reconciles COMPLETED fails here. - Assert the real invariant, not just
failed.test_stopped_run/test_error_run_partialassert onlyget_failed_files; they don't verify the actual contract (total - successful - failed == 0, i.e. zero phantom in-progress). Add that derived assertion so the test fails if either count regresses. - Unknown/garbage status (
is_failure's swallowedValueError→elsepath) is the untested branch most worth pinning:_execution("GARBAGE", 5, completed=1, errored=1)should return1and must never raise (a raise would 500 the executions list endpoint).
Minor: the 4b180f05 exec id in the line-55 comment is an opaque one-off production UUID — the rest of the comment already describes the scenario; consider dropping the id.
…nal anomalies, DB-free test - P1: get_failed_files no longer trusts total_files blindly. If terminal rows exceed total_files (counter drift / impossible count), fall back to the real ERROR-row count instead of total-successful, so genuine failures can't silently vanish (the inverse of the phantom-in-progress bug). Otherwise unaccounted files (no terminal row) still count as failed. - P2: memoise per-(execution,status) COUNTs so successful/failed don't double-query the same status (also tightens the pre-existing N+1 for these fields). The successful>total clamp now logs a WARNING instead of erasing the anomaly silently. - note that is_failure swallows ValueError -> unknown status takes the row-count path (never raises -> can't 500 the list endpoint). - docstring: fix the 'no-op' claim (only when every file has a terminal row), drop the transport mention + the opaque exec UUID, anchor in-progress to DetailedLogs.jsx. - test harness: force DJANGO_SETTINGS_MODULE (setdefault is a no-op when a dev shell already exports it); stub the model in sys.modules and drop django.setup() so the test is genuinely DB-free (0.26s, no AppConfig.ready / DB connection), mirroring usage_v2/pg_queue sibling tests. - tests added: drift-no-under-report, successful>total clamp+warn, COMPLETED not-reconciled asymmetry, unknown-status-no-raise, and the real total-successful-failed==0 invariant on failure runs. Deferred (noted on PR): an explicit in_progress_files SerializerMethodField so the server owns all three buckets — bigger change (needs the frontend to render it), orthogonal to this fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks — this was a strong review; addressed in the latest commit. P1 (under-reporting on drift) — fixed. P2 (double COUNT) — fixed. Added a per-(execution, status) memo so P2 (silent clamp) — fixed. is_failure swallows ValueError — documented inline: an unrecognised status takes the plain row-count path and never raises (so it can't 500 the list endpoint). Pinned by a garbage-status test. Test harness (both P2s) — fixed. Forced P3 docstring — fixed: corrected the 'no-op' claim (only when every file has a terminal COMPLETED/ERROR row), dropped the transport mention and the opaque exec UUID, anchored in-progress to P3 test gaps — added: drift-no-under-report, successful>total clamp+warn, the COMPLETED-not-reconciled asymmetry, unknown-status-never-raises, and the real Deferred — explicit All 9 tests green; pre-commit clean. |
68d137a
into
feat/UN-3445-pg-queue-integration
|
What
A terminal ERROR/STOPPED execution no longer shows files as still "in progress" in the UI.
Why
Observed on PG exec
4b180f05(ERROR,total_files=2) which displayed "2 in progress" for a finished run.ExecutionSerializerderivessuccessful_files/failed_filesasSerializerMethodFields that countworkflow_file_executionrows (status=COMPLETED/ERROR). When orchestration fails before those rows are created (0 rows), both read 0, and the UI derives in-progress astotal - successful - failed = total→ phantom in-progress for a finished run. The persistedworkflow_execution.failed_filescolumn (set by UN-3652) is ignored by this serializer.This affects both transports — a Celery orchestration that fails pre-row-creation shows the same artifact. It is not a PG regression; it's a latent display bug.
How
get_failed_filesnow returnstotal_files - successful_filesfor a terminal failure run (ExecutionStatus.is_failure→ ERROR/STOPPED): every file that didn't succeed in a finished failure run is a failure, including ones that never got a row or were left PENDING/EXECUTING by the abort.Scoped to
is_failure, so:max(0, …)guards a stalesuccessful > totalcount. OSS read-path only; transport-agnostic.Tests
test_execution_serializer.py(6, ORM mocked — no test DB, mirroring the repo's existing mocked-ORM serializer tests): error/stopped reconcile (incl. the4b180f050-row case + partial rows), COMPLETED + EXECUTING unchanged, negative-clamp guard.Validation
Deterministic read-path logic, fully unit-covered across every branch. E2E validates on deploy: a failed execution will read "N failed, 0 in progress" instead of phantom in-progress.