UN-3657 [FIX] Execution serializer — terminal-failure run reports unaccounted files as failed, not in-progress by muhammad-ali-e · Pull Request #2126 · Zipstack/unstract

muhammad-ali-e · Jun 30, 2026

What

A terminal ERROR/STOPPED execution no longer shows files as still "in progress" in the UI.

Why

Observed on PG exec 4b180f05 (ERROR, total_files=2) which displayed "2 in progress" for a finished run. ExecutionSerializer derives successful_files/failed_files as SerializerMethodFields that count workflow_file_execution rows (status=COMPLETED/ERROR). When orchestration fails before those rows are created (0 rows), both read 0, and the UI derives in-progress as total - successful - failed = total → phantom in-progress for a finished run. The persisted workflow_execution.failed_files column (set by UN-3652) is ignored by this serializer.

This affects both transports — a Celery orchestration that fails pre-row-creation shows the same artifact. It is not a PG regression; it's a latent display bug.

How

get_failed_files now returns total_files - successful_files for a terminal failure run (ExecutionStatus.is_failure → ERROR/STOPPED): every file that didn't succeed in a finished failure run is a failure, including ones that never got a row or were left PENDING/EXECUTING by the abort.

Scoped to is_failure, so:

COMPLETED runs → unchanged (row count) — success path untouched.
Live runs (PENDING/EXECUTING) → unchanged — genuinely-pending files are not prematurely marked failed.
No-op whenever the rows already account for every file.

max(0, …) guards a stale successful > total count. OSS read-path only; transport-agnostic.

Note: Fix UN-3654 already prevents the specific failure that produced this display; this is defense-in-depth for any early-orchestration failure (bad source, permissions, validation).

Tests

test_execution_serializer.py (6, ORM mocked — no test DB, mirroring the repo's existing mocked-ORM serializer tests): error/stopped reconcile (incl. the 4b180f05 0-row case + partial rows), COMPLETED + EXECUTING unchanged, negative-clamp guard.

Validation

Deterministic read-path logic, fully unit-covered across every branch. E2E validates on deploy: a failed execution will read "N failed, 0 in progress" instead of phantom in-progress.

…ccounted files as failed, not in-progress A terminal ERROR/STOPPED execution could show files as still 'in progress' in the UI (observed on exec 4b180f05: ERROR, total_files=2, shown as '2 in progress'). ExecutionSerializer derives successful/failed by COUNTING workflow_file_execution rows; when orchestration fails before those rows are created (0 rows) both read 0, and the UI derives in-progress as total - successful - failed = total -> phantom in-progress for a finished run. get_failed_files now returns total - successful for a terminal FAILURE run (ExecutionStatus.is_failure -> ERROR/STOPPED): every file that didn't succeed in a finished failure run is a failure, including ones that never got a row. Scoped to is_failure, so COMPLETED runs and live (PENDING/EXECUTING) runs keep the exact row-count behaviour -> no-op when rows already account for every file, success path + real-time progress untouched on every transport. max(0, ...) guards a stale successful>total count. OSS read-path only; transport-agnostic (also fixes the latent Celery case). Fix #1 (UN-3654) already PREVENTS the failure that produced this display; this is defense-in-depth for any early-orchestration failure. Tests: serializer unit tests (ORM mocked, no DB) — error/stopped reconcile, completed + executing unchanged, negative-clamp guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · Jun 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62abb84e-00f4-4395-888b-3cf00d82d67b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch UN-3657-execution-terminal-file-counts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

greptile-apps · Jun 30, 2026

Greptile Summary

This PR fixes a display bug where terminal ERROR/STOPPED executions showed phantom "in-progress" files in the UI. The root cause was that ExecutionSerializer.get_failed_files counted only workflow_file_execution rows with status=ERROR, leaving files that never got rows (orchestration failed before row creation) unaccounted — the UI derived those as in-progress via total - successful - failed.

get_failed_files reconciliation: For terminal failure runs, the method now returns total_files - successful_files so every non-successful file in a finished failure run lands in failed. A drift guard fires a WARNING log and falls back to raw ERROR-row counts when successful + error_rows > total_files to prevent under-reporting.
Memoization of DB count queries: A per-(execution_id, status) in-memory cache (_status_count) is added to the serializer instance to avoid the double COUNT query that the previous double-call pattern produced.
Comprehensive unit tests: Six cases cover the zero-row scenario, partial rows, STOPPED runs, COMPLETED/EXECUTING unchanged paths, and the drift-guard warning — all without a live DB.

Confidence Score: 5/5

Safe to merge — the change is a pure read-path fix on the serializer with no writes, no schema changes, and no side effects on non-failure executions.

The reconciliation logic is deterministic and transport-agnostic. All branches (zero rows, partial rows, STOPPED, COMPLETED unchanged, EXECUTING unchanged, drift guard) are covered by the new unit tests. The memoization cache is keyed correctly by (execution_id, status) so list-serialization of multiple executions is safe. No existing tests are broken.

No files require special attention.

Important Files Changed

Filename	Overview
backend/workflow_manager/execution/serializer/execution.py	Core fix: adds per-(id,status) memoized count helper and reconciles failed_files for terminal failure runs; logic is correct and edge cases (None total_files, drift, unknown status) are handled.
backend/workflow_manager/execution/tests/test_execution_serializer.py	New test file covering all six reconciliation branches (zero rows, partial rows, STOPPED, COMPLETED, EXECUTING, drift guard) with ORM-free mocks, mirroring existing repo test patterns.
backend/workflow_manager/execution/tests/init.py	Empty init file to make the new tests directory a Python package.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["get_failed_files(obj)"] --> B["error_rows = _status_count(obj, ERROR)"]
    B --> C{"ExecutionStatus.is_failure(obj.status)?"}
    C -- "No (COMPLETED / EXECUTING / PENDING / unknown)" --> D["return error_rows\n(unchanged behaviour)"]
    C -- "Yes (ERROR / STOPPED)" --> E["successful = _status_count(obj, COMPLETED)\ntotal = obj.total_files or 0"]
    E --> F{"successful + error_rows > total?"}
    F -- "Yes (drift / impossible count)" --> G["logger.warning(...)\nreturn error_rows"]
    F -- "No" --> H["return total - successful\n(absorbs unaccounted/PENDING/EXECUTING rows)"]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["get_failed_files(obj)"] --> B["error_rows = _status_count(obj, ERROR)"]
    B --> C{"ExecutionStatus.is_failure(obj.status)?"}
    C -- "No (COMPLETED / EXECUTING / PENDING / unknown)" --> D["return error_rows\n(unchanged behaviour)"]
    C -- "Yes (ERROR / STOPPED)" --> E["successful = _status_count(obj, COMPLETED)\ntotal = obj.total_files or 0"]
    E --> F{"successful + error_rows > total?"}
    F -- "Yes (drift / impossible count)" --> G["logger.warning(...)\nreturn error_rows"]
    F -- "No" --> H["return total - successful\n(absorbs unaccounted/PENDING/EXECUTING rows)"]

_{Reviews (2): Last reviewed commit: "UN-3657 [FIX] address PR #2126 review — ..." | Re-trigger Greptile}

muhammad-ali-e

Automated PR review (PR Review Toolkit)

Multi-agent review of the get_failed_files reconciliation change. The production fix is logically correct and well-scoped for the happy path it targets — finished ERROR/STOPPED runs no longer show phantom in-progress files. Findings below are about robustness of the new derivation, observability of the defensive guards, the test harness, and docstring precision. Inline comments follow; a prioritized summary is in the PR thread.

Note: the existing greptile P2 on line 50 (double COUNT query) is a separate perf point and is not repeated here.

muhammad-ali-e · Jun 30, 2026

+        whenever the rows already account for every file, so the success path
+        and real-time progress are untouched on every transport.
+        """
+        if ExecutionStatus.is_failure(obj.status):


P1 — failure branch trusts total_files and discards the actual ERROR-row count, so a stale/low total_files silently under-reports failures (the inverse of the bug being fixed).

The new branch returns total_files - successful and never looks at real ERROR rows. If total_files lags the true row count (counter drift — exactly what UN-3652/UN-3657 are about — or a partial-write race during abort), genuine failures vanish and reappear as phantom in-progress via the UI's total - successful - failed.

Example: total_files=2 but 5 rows exist (3 COMPLETED, 2 ERROR) → max(0, 2-3)=0 failed; the old code returned 2. Two real failures disappear, silently.

Secondary: ExecutionStatus.is_failure swallows ValueError and returns False for any unrecognized status (data_models.py:576-581), so a malformed/legacy status string silently takes the else row-count path — reintroducing phantom-in-progress for exactly the corrupt data most worth flagging.

Suggested hardening — take the larger of the derived value and the real ERROR count, and don't let the divergence go unrecorded:

error_rows = obj.file_executions.filter(status=ExecutionStatus.ERROR.value).count() derived = total - self.get_successful_files(obj) if derived != error_rows: logger.warning("exec %s failed-file reconcile mismatch: total=%s successful=%s error_rows=%s", obj.id, total, ..., error_rows) return max(0, derived, error_rows)

(There is no logger in this module today — its siblings views/execution.py and workflow_v2/serializers.py both define one.)

Broader fix worth considering: the partition invariant successful + failed + in_progress == total is currently split across this serializer (two terms) and the frontend (DetailedLogs.jsx derives the third). Emitting an explicit in_progress_files = SerializerMethodField (= max(0, total - successful - failed)) and having the UI render it would let the server own all three buckets, making this whole class of phantom-count bug impossible regardless of how failed is computed.

muhammad-ali-e · Jun 30, 2026

+        """
+        if ExecutionStatus.is_failure(obj.status):
+            total = obj.total_files or 0
+            return max(0, total - self.get_successful_files(obj))


P2 — max(0, …) silently clamps the impossible successful > total state with no signal. successful > total means more files completed than the run claims to have had — an impossible count that indicates real counter corruption (double-counted completions, total_files written too low at dispatch, concurrent updates). The clamp returns 0 and erases the most diagnostic inconsistency there is. test_successful_over_total_is_clamped_not_negative even enshrines this silent 0 as correct.

Keep the clamp as the safe return value, but logger.warning(...) the anomaly first (execution id + both counts) so the drift is debuggable months later instead of invisible. This is consistent with the project's error-surfacing conventions.

muhammad-ali-e · Jun 30, 2026

+        with 0 rows reads as "N in progress" instead of "N failed").
+
+        Scoped to ``is_failure`` (ERROR/STOPPED) so COMPLETED runs and live
+        runs (PENDING/EXECUTING) keep the exact row-count behaviour — a no-op


P3 — docstring precision / verbosity.

"a no-op whenever the rows already account for every file" is imprecise: the new branch (total - COMPLETED) equals the old (ERROR count) only when completed + errored == total. A file with a row that is still PENDING/EXECUTING does account for the file in the plain reading, yet the new code now counts it as failed — so behavior does change there. Suggest: "a no-op whenever every file already has a terminal COMPLETED/ERROR row."

"...untouched on every transport" — this method reads only obj.status/total_files/row counts and has no transport (Celery vs PG) dependency; mentioning transport implies a conditional that doesn't exist. Drop it.

14 lines of prose for a 3-line method, with the rationale repeated again in the test-module docstring. Consider trimming to ~6 lines, keeping the load-bearing "why" (UI derives in-progress as total - successful - failed). Ideally anchor that claim to the specific frontend component so it doesn't rot silently.

muhammad-ali-e · Jun 30, 2026

+import django
+from django.apps import apps
+
+os.environ.setdefault("DJANGO_SETTINGS_MODULE", "backend.settings.test")


P2 — os.environ.setdefault("DJANGO_SETTINGS_MODULE", …) is a no-op when the var is already exported, which is extremely common in dev shells (backend.settings.dev, or an enterprise backend.settings.cloud). A developer with DJANGO_SETTINGS_MODULE already set gets their module instead of backend.settings.test, breaking collection (reproduced: ModuleNotFoundError: No module named 'backend.settings.cloud' on a checkout with that env). Force the assignment: os.environ["DJANGO_SETTINGS_MODULE"] = "backend.settings.test".

muhammad-ali-e · Jun 30, 2026

+
+os.environ.setdefault("DJANGO_SETTINGS_MODULE", "backend.settings.test")
+if not apps.ready:
+    django.setup()


P2 — django.setup() contradicts the module docstring's "no test database is required" (line 10) and diverges from this repo's DB-free test convention. django.setup() imports the full backend.settings.base stack and runs every app's AppConfig.ready() — which (a) requires a complete prod-like env (from a clean checkout with only sample.test.env it aborts on missing CELERY_BROKER_BASE_URL/INDEXING_FLAG_TTL), and (b) actually opens a live DB connection during startup (execution_log_utils.ready()). So the bootstrap is neither env-free nor DB-free as claimed, even though the serializer methods under test are pure.

Sibling DB-free tests avoid this by stubbing the models in sys.modules before import (usage_v2/tests/test_helper.py, pg_queue/tests/test_producer.py). Recommend following that pattern — stub workflow_manager.workflow_v2.models (+ enums), drop django.setup(), and the docstring becomes true. (Relevant since there is no backend pytest job in CI, so the clean-environment fragility is the primary failure mode.)

muhammad-ali-e · Jun 30, 2026

+        obj = _execution(ExecutionStatus.STOPPED.value, 5, completed=3)
+        assert self.serializer.get_failed_files(obj) == 2
+
+    def test_completed_run_uses_row_count_unchanged(self):


P3 — test gaps that would lock the intended contract:

COMPLETED-not-reconciled asymmetry is untested. This case only covers completed + errored == total. The deliberate decision not to reconcile COMPLETED is invisible: add _execution(COMPLETED, total=3, completed=1, errored=1) asserting failed == 1 (the ERROR-row count), not total - completed == 2, so a future "fix" that reconciles COMPLETED fails here.

Assert the real invariant, not just failed. test_stopped_run/test_error_run_partial assert only get_failed_files; they don't verify the actual contract (total - successful - failed == 0, i.e. zero phantom in-progress). Add that derived assertion so the test fails if either count regresses.

Unknown/garbage status (is_failure's swallowed ValueError → else path) is the untested branch most worth pinning: _execution("GARBAGE", 5, completed=1, errored=1) should return 1 and must never raise (a raise would 500 the executions list endpoint).

Minor: the 4b180f05 exec id in the line-55 comment is an opaque one-off production UUID — the rest of the comment already describes the scenario; consider dropping the id.

…nal anomalies, DB-free test - P1: get_failed_files no longer trusts total_files blindly. If terminal rows exceed total_files (counter drift / impossible count), fall back to the real ERROR-row count instead of total-successful, so genuine failures can't silently vanish (the inverse of the phantom-in-progress bug). Otherwise unaccounted files (no terminal row) still count as failed. - P2: memoise per-(execution,status) COUNTs so successful/failed don't double-query the same status (also tightens the pre-existing N+1 for these fields). The successful>total clamp now logs a WARNING instead of erasing the anomaly silently. - note that is_failure swallows ValueError -> unknown status takes the row-count path (never raises -> can't 500 the list endpoint). - docstring: fix the 'no-op' claim (only when every file has a terminal row), drop the transport mention + the opaque exec UUID, anchor in-progress to DetailedLogs.jsx. - test harness: force DJANGO_SETTINGS_MODULE (setdefault is a no-op when a dev shell already exports it); stub the model in sys.modules and drop django.setup() so the test is genuinely DB-free (0.26s, no AppConfig.ready / DB connection), mirroring usage_v2/pg_queue sibling tests. - tests added: drift-no-under-report, successful>total clamp+warn, COMPLETED not-reconciled asymmetry, unknown-status-no-raise, and the real total-successful-failed==0 invariant on failure runs. Deferred (noted on PR): an explicit in_progress_files SerializerMethodField so the server owns all three buckets — bigger change (needs the frontend to render it), orthogonal to this fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

muhammad-ali-e · Jun 30, 2026

Thanks — this was a strong review; addressed in the latest commit.

P1 (under-reporting on drift) — fixed. get_failed_files no longer blindly trusts total_files. It now also reads the real ERROR-row count and, when the terminal rows already exceed total_files (counter drift / impossible count), returns the ERROR rows instead of total - successful — so genuine failures can't silently vanish. Your example (total=2, 3 COMPLETED + 2 ERROR) now correctly returns 2 (and logs the drift). The common pre-row-creation case still derives total - successful.

P2 (double COUNT) — fixed. Added a per-(execution, status) memo so successful_files/failed_files don't issue duplicate COUNTs for the same status — which also tightens the pre-existing N+1 the TODO flags (failure reads now do 2 counts, not 3).

P2 (silent clamp) — fixed. successful > total (and any terminal-rows-exceed-total) now logs a WARNING with the execution id + both counts before returning the safe value, instead of erasing the most diagnostic inconsistency silently. (Added a module logger.)

is_failure swallows ValueError — documented inline: an unrecognised status takes the plain row-count path and never raises (so it can't 500 the list endpoint). Pinned by a garbage-status test.

Test harness (both P2s) — fixed. Forced DJANGO_SETTINGS_MODULE (the setdefault was a no-op when a dev shell already exports .dev/.cloud), and switched to the sibling sys.modules model-stub pattern — dropped django.setup() entirely. The test is now genuinely DB-free / app-registry-free (runs in 0.26s, no AppConfig.ready() DB connection), so the docstring is true.

P3 docstring — fixed: corrected the 'no-op' claim (only when every file has a terminal COMPLETED/ERROR row), dropped the transport mention and the opaque exec UUID, anchored in-progress to DetailedLogs.jsx.

P3 test gaps — added: drift-no-under-report, successful>total clamp+warn, the COMPLETED-not-reconciled asymmetry, unknown-status-never-raises, and the real total - successful - failed == 0 invariant on failure runs (not just failed).

Deferred — explicit in_progress_files server field. Agree it's the durable fix (server owns all three buckets, makes the whole class impossible). Left out of this PR because it needs a matching frontend change in DetailedLogs.jsx; happy to do it as a focused follow-up.

All 9 tests green; pre-commit clean.

sonarqubecloud · Jun 30, 2026

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

greptile-apps Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread backend/workflow_manager/execution/serializer/execution.py Outdated

muhammad-ali-e commented Jun 30, 2026

View reviewed changes

muhammad-ali-e merged commit 68d137a into feat/UN-3445-pg-queue-integration Jun 30, 2026
6 checks passed

muhammad-ali-e deleted the UN-3657-execution-terminal-file-counts branch June 30, 2026 05:50

Search code, repositories, users, issues, pull requests...

Uh oh!

Conversation

muhammad-ali-e commented Jun 30, 2026

What

Why

How

Tests

Validation

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

greptile-apps Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

muhammad-ali-e left a comment

Choose a reason for hiding this comment

Automated PR review (PR Review Toolkit)

Uh oh!

muhammad-ali-e Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

muhammad-ali-e Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

muhammad-ali-e Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

muhammad-ali-e Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

muhammad-ali-e Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

muhammad-ali-e Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

muhammad-ali-e commented Jun 30, 2026

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 30, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

greptile-apps Bot commented Jun 30, 2026 •

edited

Loading