Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

UN-3657 [FIX] Execution serializer — terminal-failure run reports unaccounted files as failed, not in-progress#2126

Merged
muhammad-ali-e merged 2 commits into
feat/UN-3445-pg-queue-integrationZipstack/unstract:feat/UN-3445-pg-queue-integrationfrom
UN-3657-execution-terminal-file-countsZipstack/unstract:UN-3657-execution-terminal-file-countsCopy head branch name to clipboard
Jun 30, 2026
Merged

UN-3657 [FIX] Execution serializer — terminal-failure run reports unaccounted files as failed, not in-progress#2126
muhammad-ali-e merged 2 commits into
feat/UN-3445-pg-queue-integrationZipstack/unstract:feat/UN-3445-pg-queue-integrationfrom
UN-3657-execution-terminal-file-countsZipstack/unstract:UN-3657-execution-terminal-file-countsCopy head branch name to clipboard

Conversation

@muhammad-ali-e

Copy link
Copy Markdown
Contributor

What

A terminal ERROR/STOPPED execution no longer shows files as still "in progress" in the UI.

Why

Observed on PG exec 4b180f05 (ERROR, total_files=2) which displayed "2 in progress" for a finished run. ExecutionSerializer derives successful_files/failed_files as SerializerMethodFields that count workflow_file_execution rows (status=COMPLETED/ERROR). When orchestration fails before those rows are created (0 rows), both read 0, and the UI derives in-progress as total - successful - failed = total → phantom in-progress for a finished run. The persisted workflow_execution.failed_files column (set by UN-3652) is ignored by this serializer.

This affects both transports — a Celery orchestration that fails pre-row-creation shows the same artifact. It is not a PG regression; it's a latent display bug.

How

get_failed_files now returns total_files - successful_files for a terminal failure run (ExecutionStatus.is_failure → ERROR/STOPPED): every file that didn't succeed in a finished failure run is a failure, including ones that never got a row or were left PENDING/EXECUTING by the abort.

Scoped to is_failure, so:

  • COMPLETED runs → unchanged (row count) — success path untouched.
  • Live runs (PENDING/EXECUTING) → unchanged — genuinely-pending files are not prematurely marked failed.
  • No-op whenever the rows already account for every file.

max(0, …) guards a stale successful > total count. OSS read-path only; transport-agnostic.

Note: Fix UN-3654 already prevents the specific failure that produced this display; this is defense-in-depth for any early-orchestration failure (bad source, permissions, validation).

Tests

test_execution_serializer.py (6, ORM mocked — no test DB, mirroring the repo's existing mocked-ORM serializer tests): error/stopped reconcile (incl. the 4b180f05 0-row case + partial rows), COMPLETED + EXECUTING unchanged, negative-clamp guard.

Validation

Deterministic read-path logic, fully unit-covered across every branch. E2E validates on deploy: a failed execution will read "N failed, 0 in progress" instead of phantom in-progress.

…ccounted files as failed, not in-progress

A terminal ERROR/STOPPED execution could show files as still 'in progress'
in the UI (observed on exec 4b180f05: ERROR, total_files=2, shown as '2 in
progress'). ExecutionSerializer derives successful/failed by COUNTING
workflow_file_execution rows; when orchestration fails before those rows are
created (0 rows) both read 0, and the UI derives in-progress as
total - successful - failed = total -> phantom in-progress for a finished run.

get_failed_files now returns total - successful for a terminal FAILURE run
(ExecutionStatus.is_failure -> ERROR/STOPPED): every file that didn't succeed
in a finished failure run is a failure, including ones that never got a row.
Scoped to is_failure, so COMPLETED runs and live (PENDING/EXECUTING) runs keep
the exact row-count behaviour -> no-op when rows already account for every
file, success path + real-time progress untouched on every transport.
max(0, ...) guards a stale successful>total count.

OSS read-path only; transport-agnostic (also fixes the latent Celery case).
Fix #1 (UN-3654) already PREVENTS the failure that produced this display;
this is defense-in-depth for any early-orchestration failure.

Tests: serializer unit tests (ORM mocked, no DB) — error/stopped reconcile,
completed + executing unchanged, negative-clamp guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62abb84e-00f4-4395-888b-3cf00d82d67b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch UN-3657-execution-terminal-file-counts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a display bug where terminal ERROR/STOPPED executions showed phantom "in-progress" files in the UI. The root cause was that ExecutionSerializer.get_failed_files counted only workflow_file_execution rows with status=ERROR, leaving files that never got rows (orchestration failed before row creation) unaccounted — the UI derived those as in-progress via total - successful - failed.

  • get_failed_files reconciliation: For terminal failure runs, the method now returns total_files - successful_files so every non-successful file in a finished failure run lands in failed. A drift guard fires a WARNING log and falls back to raw ERROR-row counts when successful + error_rows > total_files to prevent under-reporting.
  • Memoization of DB count queries: A per-(execution_id, status) in-memory cache (_status_count) is added to the serializer instance to avoid the double COUNT query that the previous double-call pattern produced.
  • Comprehensive unit tests: Six cases cover the zero-row scenario, partial rows, STOPPED runs, COMPLETED/EXECUTING unchanged paths, and the drift-guard warning — all without a live DB.

Confidence Score: 5/5

Safe to merge — the change is a pure read-path fix on the serializer with no writes, no schema changes, and no side effects on non-failure executions.

The reconciliation logic is deterministic and transport-agnostic. All branches (zero rows, partial rows, STOPPED, COMPLETED unchanged, EXECUTING unchanged, drift guard) are covered by the new unit tests. The memoization cache is keyed correctly by (execution_id, status) so list-serialization of multiple executions is safe. No existing tests are broken.

No files require special attention.

Important Files Changed

Filename Overview
backend/workflow_manager/execution/serializer/execution.py Core fix: adds per-(id,status) memoized count helper and reconciles failed_files for terminal failure runs; logic is correct and edge cases (None total_files, drift, unknown status) are handled.
backend/workflow_manager/execution/tests/test_execution_serializer.py New test file covering all six reconciliation branches (zero rows, partial rows, STOPPED, COMPLETED, EXECUTING, drift guard) with ORM-free mocks, mirroring existing repo test patterns.
backend/workflow_manager/execution/tests/init.py Empty init file to make the new tests directory a Python package.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["get_failed_files(obj)"] --> B["error_rows = _status_count(obj, ERROR)"]
    B --> C{"ExecutionStatus.is_failure(obj.status)?"}
    C -- "No (COMPLETED / EXECUTING / PENDING / unknown)" --> D["return error_rows\n(unchanged behaviour)"]
    C -- "Yes (ERROR / STOPPED)" --> E["successful = _status_count(obj, COMPLETED)\ntotal = obj.total_files or 0"]
    E --> F{"successful + error_rows > total?"}
    F -- "Yes (drift / impossible count)" --> G["logger.warning(...)\nreturn error_rows"]
    F -- "No" --> H["return total - successful\n(absorbs unaccounted/PENDING/EXECUTING rows)"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["get_failed_files(obj)"] --> B["error_rows = _status_count(obj, ERROR)"]
    B --> C{"ExecutionStatus.is_failure(obj.status)?"}
    C -- "No (COMPLETED / EXECUTING / PENDING / unknown)" --> D["return error_rows\n(unchanged behaviour)"]
    C -- "Yes (ERROR / STOPPED)" --> E["successful = _status_count(obj, COMPLETED)\ntotal = obj.total_files or 0"]
    E --> F{"successful + error_rows > total?"}
    F -- "Yes (drift / impossible count)" --> G["logger.warning(...)\nreturn error_rows"]
    F -- "No" --> H["return total - successful\n(absorbs unaccounted/PENDING/EXECUTING rows)"]
Loading

Reviews (2): Last reviewed commit: "UN-3657 [FIX] address PR #2126 review — ..." | Re-trigger Greptile

Comment thread backend/workflow_manager/execution/serializer/execution.py Outdated

@muhammad-ali-e muhammad-ali-e left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review (PR Review Toolkit)

Multi-agent review of the get_failed_files reconciliation change. The production fix is logically correct and well-scoped for the happy path it targets — finished ERROR/STOPPED runs no longer show phantom in-progress files. Findings below are about robustness of the new derivation, observability of the defensive guards, the test harness, and docstring precision. Inline comments follow; a prioritized summary is in the PR thread.

Note: the existing greptile P2 on line 50 (double COUNT query) is a separate perf point and is not repeated here.

whenever the rows already account for every file, so the success path
and real-time progress are untouched on every transport.
"""
if ExecutionStatus.is_failure(obj.status):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 — failure branch trusts total_files and discards the actual ERROR-row count, so a stale/low total_files silently under-reports failures (the inverse of the bug being fixed).

The new branch returns total_files - successful and never looks at real ERROR rows. If total_files lags the true row count (counter drift — exactly what UN-3652/UN-3657 are about — or a partial-write race during abort), genuine failures vanish and reappear as phantom in-progress via the UI's total - successful - failed.

Example: total_files=2 but 5 rows exist (3 COMPLETED, 2 ERROR) → max(0, 2-3)=0 failed; the old code returned 2. Two real failures disappear, silently.

Secondary: ExecutionStatus.is_failure swallows ValueError and returns False for any unrecognized status (data_models.py:576-581), so a malformed/legacy status string silently takes the else row-count path — reintroducing phantom-in-progress for exactly the corrupt data most worth flagging.

Suggested hardening — take the larger of the derived value and the real ERROR count, and don't let the divergence go unrecorded:

error_rows = obj.file_executions.filter(status=ExecutionStatus.ERROR.value).count()
derived = total - self.get_successful_files(obj)
if derived != error_rows:
    logger.warning("exec %s failed-file reconcile mismatch: total=%s successful=%s error_rows=%s", obj.id, total, ..., error_rows)
return max(0, derived, error_rows)

(There is no logger in this module today — its siblings views/execution.py and workflow_v2/serializers.py both define one.)

Broader fix worth considering: the partition invariant successful + failed + in_progress == total is currently split across this serializer (two terms) and the frontend (DetailedLogs.jsx derives the third). Emitting an explicit in_progress_files = SerializerMethodField (= max(0, total - successful - failed)) and having the UI render it would let the server own all three buckets, making this whole class of phantom-count bug impossible regardless of how failed is computed.

"""
if ExecutionStatus.is_failure(obj.status):
total = obj.total_files or 0
return max(0, total - self.get_successful_files(obj))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — max(0, …) silently clamps the impossible successful > total state with no signal. successful > total means more files completed than the run claims to have had — an impossible count that indicates real counter corruption (double-counted completions, total_files written too low at dispatch, concurrent updates). The clamp returns 0 and erases the most diagnostic inconsistency there is. test_successful_over_total_is_clamped_not_negative even enshrines this silent 0 as correct.

Keep the clamp as the safe return value, but logger.warning(...) the anomaly first (execution id + both counts) so the drift is debuggable months later instead of invisible. This is consistent with the project's error-surfacing conventions.

with 0 rows reads as "N in progress" instead of "N failed").

Scoped to ``is_failure`` (ERROR/STOPPED) so COMPLETED runs and live
runs (PENDING/EXECUTING) keep the exact row-count behaviour — a no-op

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 — docstring precision / verbosity.

  1. "a no-op whenever the rows already account for every file" is imprecise: the new branch (total - COMPLETED) equals the old (ERROR count) only when completed + errored == total. A file with a row that is still PENDING/EXECUTING does account for the file in the plain reading, yet the new code now counts it as failed — so behavior does change there. Suggest: "a no-op whenever every file already has a terminal COMPLETED/ERROR row."
  2. "...untouched on every transport" — this method reads only obj.status/total_files/row counts and has no transport (Celery vs PG) dependency; mentioning transport implies a conditional that doesn't exist. Drop it.
  3. 14 lines of prose for a 3-line method, with the rationale repeated again in the test-module docstring. Consider trimming to ~6 lines, keeping the load-bearing "why" (UI derives in-progress as total - successful - failed). Ideally anchor that claim to the specific frontend component so it doesn't rot silently.

import django
from django.apps import apps

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "backend.settings.test")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — os.environ.setdefault("DJANGO_SETTINGS_MODULE", …) is a no-op when the var is already exported, which is extremely common in dev shells (backend.settings.dev, or an enterprise backend.settings.cloud). A developer with DJANGO_SETTINGS_MODULE already set gets their module instead of backend.settings.test, breaking collection (reproduced: ModuleNotFoundError: No module named 'backend.settings.cloud' on a checkout with that env). Force the assignment: os.environ["DJANGO_SETTINGS_MODULE"] = "backend.settings.test".


os.environ.setdefault("DJANGO_SETTINGS_MODULE", "backend.settings.test")
if not apps.ready:
django.setup()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — django.setup() contradicts the module docstring's "no test database is required" (line 10) and diverges from this repo's DB-free test convention. django.setup() imports the full backend.settings.base stack and runs every app's AppConfig.ready() — which (a) requires a complete prod-like env (from a clean checkout with only sample.test.env it aborts on missing CELERY_BROKER_BASE_URL/INDEXING_FLAG_TTL), and (b) actually opens a live DB connection during startup (execution_log_utils.ready()). So the bootstrap is neither env-free nor DB-free as claimed, even though the serializer methods under test are pure.

Sibling DB-free tests avoid this by stubbing the models in sys.modules before import (usage_v2/tests/test_helper.py, pg_queue/tests/test_producer.py). Recommend following that pattern — stub workflow_manager.workflow_v2.models (+ enums), drop django.setup(), and the docstring becomes true. (Relevant since there is no backend pytest job in CI, so the clean-environment fragility is the primary failure mode.)

obj = _execution(ExecutionStatus.STOPPED.value, 5, completed=3)
assert self.serializer.get_failed_files(obj) == 2

def test_completed_run_uses_row_count_unchanged(self):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 — test gaps that would lock the intended contract:

  1. COMPLETED-not-reconciled asymmetry is untested. This case only covers completed + errored == total. The deliberate decision not to reconcile COMPLETED is invisible: add _execution(COMPLETED, total=3, completed=1, errored=1) asserting failed == 1 (the ERROR-row count), not total - completed == 2, so a future "fix" that reconciles COMPLETED fails here.
  2. Assert the real invariant, not just failed. test_stopped_run/test_error_run_partial assert only get_failed_files; they don't verify the actual contract (total - successful - failed == 0, i.e. zero phantom in-progress). Add that derived assertion so the test fails if either count regresses.
  3. Unknown/garbage status (is_failure's swallowed ValueErrorelse path) is the untested branch most worth pinning: _execution("GARBAGE", 5, completed=1, errored=1) should return 1 and must never raise (a raise would 500 the executions list endpoint).

Minor: the 4b180f05 exec id in the line-55 comment is an opaque one-off production UUID — the rest of the comment already describes the scenario; consider dropping the id.

…nal anomalies, DB-free test

- P1: get_failed_files no longer trusts total_files blindly. If terminal rows
  exceed total_files (counter drift / impossible count), fall back to the real
  ERROR-row count instead of total-successful, so genuine failures can't silently
  vanish (the inverse of the phantom-in-progress bug). Otherwise unaccounted
  files (no terminal row) still count as failed.
- P2: memoise per-(execution,status) COUNTs so successful/failed don't
  double-query the same status (also tightens the pre-existing N+1 for these
  fields). The successful>total clamp now logs a WARNING instead of erasing the
  anomaly silently.
- note that is_failure swallows ValueError -> unknown status takes the row-count
  path (never raises -> can't 500 the list endpoint).
- docstring: fix the 'no-op' claim (only when every file has a terminal row),
  drop the transport mention + the opaque exec UUID, anchor in-progress to
  DetailedLogs.jsx.
- test harness: force DJANGO_SETTINGS_MODULE (setdefault is a no-op when a dev
  shell already exports it); stub the model in sys.modules and drop
  django.setup() so the test is genuinely DB-free (0.26s, no AppConfig.ready /
  DB connection), mirroring usage_v2/pg_queue sibling tests.
- tests added: drift-no-under-report, successful>total clamp+warn, COMPLETED
  not-reconciled asymmetry, unknown-status-no-raise, and the real
  total-successful-failed==0 invariant on failure runs.

Deferred (noted on PR): an explicit in_progress_files SerializerMethodField so
the server owns all three buckets — bigger change (needs the frontend to render
it), orthogonal to this fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@muhammad-ali-e

Copy link
Copy Markdown
Contributor Author

Thanks — this was a strong review; addressed in the latest commit.

P1 (under-reporting on drift) — fixed. get_failed_files no longer blindly trusts total_files. It now also reads the real ERROR-row count and, when the terminal rows already exceed total_files (counter drift / impossible count), returns the ERROR rows instead of total - successful — so genuine failures can't silently vanish. Your example (total=2, 3 COMPLETED + 2 ERROR) now correctly returns 2 (and logs the drift). The common pre-row-creation case still derives total - successful.

P2 (double COUNT) — fixed. Added a per-(execution, status) memo so successful_files/failed_files don't issue duplicate COUNTs for the same status — which also tightens the pre-existing N+1 the TODO flags (failure reads now do 2 counts, not 3).

P2 (silent clamp) — fixed. successful > total (and any terminal-rows-exceed-total) now logs a WARNING with the execution id + both counts before returning the safe value, instead of erasing the most diagnostic inconsistency silently. (Added a module logger.)

is_failure swallows ValueError — documented inline: an unrecognised status takes the plain row-count path and never raises (so it can't 500 the list endpoint). Pinned by a garbage-status test.

Test harness (both P2s) — fixed. Forced DJANGO_SETTINGS_MODULE (the setdefault was a no-op when a dev shell already exports .dev/.cloud), and switched to the sibling sys.modules model-stub pattern — dropped django.setup() entirely. The test is now genuinely DB-free / app-registry-free (runs in 0.26s, no AppConfig.ready() DB connection), so the docstring is true.

P3 docstring — fixed: corrected the 'no-op' claim (only when every file has a terminal COMPLETED/ERROR row), dropped the transport mention and the opaque exec UUID, anchored in-progress to DetailedLogs.jsx.

P3 test gaps — added: drift-no-under-report, successful>total clamp+warn, the COMPLETED-not-reconciled asymmetry, unknown-status-never-raises, and the real total - successful - failed == 0 invariant on failure runs (not just failed).

Deferred — explicit in_progress_files server field. Agree it's the durable fix (server owns all three buckets, makes the whole class impossible). Left out of this PR because it needs a matching frontend change in DetailedLogs.jsx; happy to do it as a focused follow-up.

All 9 tests green; pre-commit clean.

@muhammad-ali-e muhammad-ali-e merged commit 68d137a into feat/UN-3445-pg-queue-integration Jun 30, 2026
6 checks passed
@muhammad-ali-e muhammad-ali-e deleted the UN-3657-execution-terminal-file-counts branch June 30, 2026 05:50
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.