removed translator, PII redactor and workflow settings by muhammad-ali-e · Pull Request #11 · Zipstack/unstract

muhammad-ali-e · Feb 28, 2024

What

Removde Tools from registry json
- Removed Translator from registry json
- Removed PII redact from registry json
Removed Workflow settings
- From workflow model, execution and UI

...

Screenshots

...

Checklist

I have read and understood the Contribution Guidelines.

chandrasekharan-zipstack

LGTM

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>

… wrappers Address review comments on PR #1886: - #9 (typing): call_with_retry / acall_with_retry / iter_with_retry previously returned `object`, erasing caller type info. Add PEP 695 generics so the return type flows from the wrapped callable: acall_with_retry now takes Callable[[], Awaitable[T]] and iter_with_retry takes Callable[[], Iterable[T]] -> Generator[T, ...]. - #11 / #13 (DRY): `_pop_retry_params` in embedding.py and `_disable_litellm_retry` in llm.py were identical logic. Lift to shared `pop_litellm_retry_kwargs` helper in retry_utils.py and delete both methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Unified retry for LLM and embedding providers litellm's retry only works for SDK-based providers (OpenAI/Azure). httpx-based providers (Anthropic, Vertex, Bedrock, Mistral) and ALL embedding calls silently ignore max_retries. This adds self-managed retry with exponential backoff at the SDK layer, disabling litellm's own retry entirely for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [REFACTOR] DRY retry logic into reusable call_with_retry utilities Move retry loops out of LLM/Embedding classes into generic call_with_retry, acall_with_retry, and iter_with_retry functions in retry_utils.py. Both classes now call these directly instead of maintaining their own retry helper methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Consolidate retry logic, expose max_retries for all adapters - Extract _get_retry_delay() shared helper to eliminate duplicated retry decision logic across call_with_retry, acall_with_retry, iter_with_retry, and retry_with_exponential_backoff - Add num_retries=0 to embedding._pop_retry_params() to fully disable litellm's internal retry for embedding calls - Expose max_retries in UI JSON schemas for embedding adapters (OpenAI, Azure, VertexAI, Ollama) and Ollama LLM — previously the field existed in Pydantic models but wasn't shown to users, silently defaulting to 0 retries - Add debug logging to LLM and Embedding retry parameter extraction - Clarify docstrings distinguishing is_retryable_litellm_error() from is_retryable_error() (different exception hierarchies) - Remove stale noqa: C901 from simplified retry_with_exponential_backoff Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Set max_retries default to 3 for all embedding and Ollama LLM adapters Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Address greptile review: fix shadowed ConnectionError, use MRO check - Fix `requests.ConnectionError` shadowing Python's builtin `ConnectionError` in `is_retryable_litellm_error()` — rename import to `RequestsConnectionError` and use `builtins.ConnectionError` / `builtins.TimeoutError` explicitly - Use `__mro__`-based class name check instead of `type(error).__name__` to also catch subclasses of retryable error types - P1 (num_retries not zeroed) was already fixed in prior commit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [FIX] Address CodeRabbit review: add APITimeoutError, validate max_retries - Add APITimeoutError to _RETRYABLE_ERROR_NAMES for explicit OpenAI SDK timeout coverage - Add _validate_max_retries() guard to call_with_retry, acall_with_retry, iter_with_retry to fail fast on negative values instead of silently returning None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * UN-3344 [FIX] Reduce cognitive complexity and remove useless except clause Address SonarCloud findings on PR #1886: - S3776: Flatten retry_with_exponential_backoff.wrapper by moving the success logging + return out of the try block and using `continue` in the retry path, so the except branch only handles the give-up case. - S2737: Drop the `except Exception: raise` clause — it was a no-op that added complexity without changing behavior (non-matching exceptions propagate naturally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3344 [FIX] Extract retry loop to top-level helper to drop cognitive complexity Sonar still flagged retry_with_exponential_backoff at complexity 16 after the previous flatten. Nested def decorator / def wrapper counted against the outer function's score. Move the retry body to a module-level _invoke_with_retries helper so the decorator factory just delegates, bringing the outer function well under the 15 threshold. Behavior is unchanged — all paths (success, retry, give-up, non-retryable propagate) are preserved and covered by the existing SDK1 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3344 [FIX] Honor Retry-After, close stream gen on retry, share give-up log Address review comments on PR #1886: - #10 (resource leak): close the generator returned by fn() before retrying in iter_with_retry — otherwise streaming providers leak an in-flight HTTP socket until GC. - #12 (behavioral regression): when we zero out SDK/wrapper retries we also lose the OpenAI SDK's native Retry-After handling on 429/503. _get_retry_delay now checks error.response.headers["retry-after"] and uses that value ahead of exponential backoff. HTTP-date form is not parsed; those fall back to backoff. - #8 (observability gap): move the "Giving up ... after N attempt(s)" log into _get_retry_delay so all four retry helpers (call_with_retry, acall_with_retry, iter_with_retry, decorator) share the same exhaustion signal. Previously only the decorator path logged it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3344 [REFACTOR] Share retry-kwargs helper and add TypeVar to retry wrappers Address review comments on PR #1886: - #9 (typing): call_with_retry / acall_with_retry / iter_with_retry previously returned `object`, erasing caller type info. Add PEP 695 generics so the return type flows from the wrapped callable: acall_with_retry now takes Callable[[], Awaitable[T]] and iter_with_retry takes Callable[[], Iterable[T]] -> Generator[T, ...]. - #11 / #13 (DRY): `_pop_retry_params` in embedding.py and `_disable_litellm_retry` in llm.py were identical logic. Lift to shared `pop_litellm_retry_kwargs` helper in retry_utils.py and delete both methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>

Two valid nitpicks from coderabbit on the previous commits: * ``_coerce_id`` now explicitly rejects ``bool`` before the ``int`` check. Without the guard, ``True``/``False`` would coerce to the literal strings ``"True"``/``"False"`` because ``bool`` is a subclass of ``int`` in Python. * ``_bind_task_context`` now logs at DEBUG with ``exc_info=True`` when ``_extract_request_id`` raises, instead of swallowing the exception silently. Behaviour is unchanged in production (request_id still falls back to task_id); the log line lets operators diagnose malformed payloads from the executor pod. Skipped coderabbit's third suggestion (replace ``@functools.lru_cache`` with explicit ``threading.Lock`` + flag) -- contradicts the explicit preference in PR review #11 for the simpler lru_cache pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3435 [FIX] Bind request_id and trace context to worker logs The new workers service was emitting `request_id:- trace_id:- span_id:-` in every log line, blocking ops debugging. The plumbing existed (RequestIDFilter, OTelFieldFilter, LogContext.request_id) but was never wired up to actual values. Changes in workers/shared/infrastructure/logging/logger.py: * RequestIDFilter now falls back to LogContext.request_id from the thread-local context when record.request_id is unset, so log_context() and signal-based binding actually populate the structured field. * Added _extract_request_id() with priority order request_id > file_execution_id > execution_id > run_id, preserving the legacy structure-tool convention of using file_execution_id as the per-tool correlation ID. Migration to true HTTP X-Request-ID later is purely additive on the backend side. * Added Celery task_prerun/task_postrun signals that bind request_id onto LogContext for every task and clear it on completion. Falls back to Celery task_id when the payload has none, so non-execution tasks (scheduler, log-consumer) still get a usable correlation ID. * Signal install is idempotent and silently no-ops if Celery is not importable (unit tests). Trace ID / span ID for the executor pod is fixed by a paired helm change in unstract-cloud (workerExecutorV2 OTel auto-instrumentation wrapper in otel.values.yaml). Verified with sanity tests covering filter fallback, explicit-extra precedence, payload extraction, prerun/postrun lifecycle, and full Celery signal round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3435 [FIX] Address PR review comments Three review fixes from greptile bot in #1932: * `_clear_task_context` now resets only task-scoped fields (`request_id`, `task_id`) instead of deleting the entire `LogContext`. This preserves baseline fields like `worker_name` set at `WorkerLogger.configure()` -- previously the first task_postrun would wipe `worker_name` permanently and subsequent tasks ran without it. * Signal install is now thread-safe via double-checked locking with `threading.Lock`. The previous bare-flag check could allow two threads racing through `_install_celery_request_id_signals` to both pass the guard and register handlers twice -- Celery's `Signal.connect()` does not deduplicate, so handlers would fire twice per task. * `_extract_request_id` now also scans dict values inside `kwargs` (e.g. `task(context={"execution_id": "abc"})`). The previous version only looked at top-level kwargs and dict args; dict-valued kwargs were silently missed and fell back to `task_id`. Search order now: top-level kwargs > nested dict in kwargs > dict args. Verified: * 20 concurrent installs register the signal exactly once. * `worker_name` survives `_clear_task_context`; `request_id` and `task_id` are nulled. * `kwargs={'context': {'execution_id': 'X'}}` resolves to 'X'. * Original 9 sanity tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3435 [FIX] Address PR review on request_id extraction Addresses 13 review comments on #1932. Highlights: * P0 -- Positional-string args now resolve. ``async_execute_bin`` is dispatched as ``send_task("async_execute_bin", args=[schema, workflow_id, execution_id, file_hash])`` -- all positional strings. The previous implementation only inspected dict args and dict kwargs values, so it returned None and fell back to the Celery task UUID, defeating per-execution log granularity for the main workflow path. ``_gather_containers`` now uses ``inspect.signature(task.run).bind_partial(*args, **kwargs)`` to map positional args to parameter names before scanning. * Priority order is now by KEY, not by container. Outer loop is over ``_REQUEST_ID_KEYS``, inner over containers, so a payload with file_execution_id in one nested dict and execution_id in another deterministically picks the higher-priority key regardless of insertion order. * ``_coerce_id`` rejects non-id types. Previous ``str(value)`` could stringify dataclass instances or arbitrary objects into ``"<X object at 0x...>"`` log lines and OTel attributes. Only ``str``, ``int``, and ``UUID`` are accepted now. * Dataclass positional args are supported via ``dataclasses.is_dataclass`` + ``dataclasses.asdict``. * ``_bind_task_context`` wraps extraction in try/except so a misbehaving payload falls back to ``task_id`` instead of leaving the previous task's id bound on the thread. * Signal install simplified to ``@functools.lru_cache(maxsize=1)`` -- idempotent and thread-safe by construction; replaces the explicit flag + Lock pattern. * ``ImportError`` for ``celery.signals`` now logs at debug level instead of disappearing silently, so a broken deployment is diagnosable rather than mysteriously missing request_ids. * Type hints use ``Mapping[str, Any]`` / ``Sequence[Any]`` instead of bare ``dict`` / ``tuple``. * Docstring on ``RequestIDFilter`` no longer name-checks a single helper as the sole writer of ``LogContext.request_id``. Tests added at ``workers/shared/tests/test_logger_request_id.py`` (22 tests covering filter fallback chain, coerce rejection, key- priority scan, signature-bound positional args, dataclass arg, nested dict priority, baseline preservation, and concurrent install). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3435 [FIX] Drop test file from PR; deferred to follow-up Tests at workers/shared/tests/test_logger_request_id.py have been moved out of this PR. Rationale: * Keeps this PR scoped to the P0 ops fix (logger.py only). * Test conventions for workers/shared/tests/ deserve their own review (only one existing test file there today). * pytest project config integration (coverage flags etc.) needs separate verification. The 22 tests have been verified locally against the current implementation; a follow-up PR will add them once test layout conventions are confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3435 [FIX] Reduce _gather_containers cognitive complexity Sonar flagged _gather_containers at cognitive complexity 19 (max 15). Extracted two single-purpose helpers: * _bind_positional_args_to_names -- the inspect.signature path that catches send_task("async_execute_bin", args=[schema, wf, exec, ...]) where ids are passed as positional strings. * _arg_as_mapping -- coerces a single positional arg into a Mapping (Mapping pass-through, dataclass via dataclasses.asdict, else None). _gather_containers itself is now a flat ordered append: bound names, kwargs + nested mappings, then per-arg coercion. Behaviour is unchanged; verified with sanity scenarios for each input shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3435 [FIX] Address coderabbit nitpicks Two valid nitpicks from coderabbit on the previous commits: * ``_coerce_id`` now explicitly rejects ``bool`` before the ``int`` check. Without the guard, ``True``/``False`` would coerce to the literal strings ``"True"``/``"False"`` because ``bool`` is a subclass of ``int`` in Python. * ``_bind_task_context`` now logs at DEBUG with ``exc_info=True`` when ``_extract_request_id`` raises, instead of swallowing the exception silently. Behaviour is unchanged in production (request_id still falls back to task_id); the log line lets operators diagnose malformed payloads from the executor pod. Skipped coderabbit's third suggestion (replace ``@functools.lru_cache`` with explicit ``threading.Lock`` + flag) -- contradicts the explicit preference in PR review #11 for the simpler lru_cache pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ing nit) Seven of Vishnu's PR review findings addressed, all backward-compat with main-branch consumers. The three [Important] design-redesign findings (#1 status __post_init__, #2 alias-pair invariant, #3 to_api_dict/to_json dead code) are deferred to a follow-up shared-infra dataclass ticket because they would either fire warning noise on existing call sites (``worker_base.py:211/222``, ``worker_patterns.py:241`` pass wrong-enum status) or change the wire/cache contract — neither acceptable mid-flight while keeping zero regression on main. Changes in this commit are either: * Pure additive (test methods, docstrings, observability) * Or provably equivalent wire output (the typed-count refactor) So a rolling deploy where old workers and new workers run concurrently sees identical wire shapes and identical behaviour for all current valid data; the only observable differences are log content (better context on the existing warning) and the presence of a new opt-in classmethod that nothing currently calls. * **Vishnu #8 [Suggestion]** — ``SkipReason`` docstring claimed "StrEnum semantics" but the class is ``(str, Enum)``, not ``enum.StrEnum``. The two differ on ``__str__``. Rewrote the docstring to describe the actual behaviour. * **Vishnu #4a [Important — log context]** — ``_parse_skipped`` now accepts an optional ``file_execution_id`` kwarg that ``from_dict`` threads through. The warning emitted for unknown wire values now carries the file identifier, so a real rolling-deploy incident is debuggable rather than a context-free warning. Optional kwarg with default — any existing caller passing one positional arg still works. * **Vishnu #9 [Suggestion]** — added ``BatchExecutionResult.from_file_results(...)`` classmethod that derives counters from typed file results. Purely additive: no existing caller uses it; the constructor signature is unchanged so producers that need their own counter semantics keep working. * **Vishnu #11 [Suggestion]** — ``process_file_batch_api`` was computing ``skipped_already_completed`` by string-matching the wire dicts AFTER already calling ``from_dict`` on them. Refactored to count from the typed list (single ``from_dict`` pass, enum compare). Provably equivalent for all current wire data. * **Vishnu #4 [Important — test gap]** — added ``test_from_dict_unknown_skipped_is_lenient`` covering the one documented crash-prevention path. A regression to bare ``SkipReason(raw)`` would have re-introduced the rolling-deploy crash and kept every other test green. * **Vishnu #5 [Important — failure-aggregation gap]** — added ``test_process_file_batch_api_batch_wrapper_failure_aggregation`` that drives one success + one failure through the batch wrapper. The existing success-only test never exercised ``failed_files += 1``. * **Vishnu #6 [Important — populated round-trip gap]** — added ``test_round_trip_with_populated_file_results`` and ``test_from_file_results_derives_counters``. The existing ``BatchExecutionResult`` round-trip test used ``file_results=[]``, so the list-comprehension in ``from_dict`` that rebuilds nested ``FileExecutionResult`` objects was never executed with a populated list. * **Vishnu #13 [Suggestion]** — replaced hardcoded line reference in test docstring with a symbol reference. Deferred to follow-up shared-infra dataclass-redesign ticket: * #1 ``__post_init__`` status clobber — would emit warning noise on every existing wrong-enum call site * #2 alias-pair invariant — back-fill via __post_init__ would change the wire shape (file_name no longer None → no longer stripped at the top level) * #3 ``to_api_dict``/``to_json`` dead code — looks like a public SDK surface; changing the body could surprise external consumers * #7 recursive ``None``-strip in ``serialize_value`` — touches every dataclass in the codebase * #10 ``Any`` typing tightening — low value, mypy tightening could trip downstream * #12 producer redundant kwargs — depends on #2's reconciliation Tests: workers chord-callback boundary suite 21 -> 25; full workers suite 622 -> 627 (no new failures; 6 pre-existing baseline unchanged). Five deterministic-order runs of the full suite returned exactly 627 passed / 6 pre-existing failed — zero flakiness from this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… / FileExecutionResult (#2020) * UN-3513 [FEAT] Type chord-callback boundary with BatchExecutionResult / FileExecutionResult Producers in workers/file_processing/tasks.py now build typed dataclasses (from unstract.core.worker_models) and emit their ``.to_dict()`` instead of hand-rolled dicts. Locks the wire shape to the dataclass schema so downstream refactors fail loud. Scope Producer-side typing only. Consumer (workers/callback/tasks.py + aggregate_file_batch_results) already reads via ``.get(..., default)`` — tolerant by construction — so no consumer-side change needed. Dataclass extensions (unstract.core.worker_models, additive only) * BatchExecutionResult gains 3 optional fields: skipped_already_completed, skipped_active_duplicate, organization_id. * FileExecutionResult gains 3 optional fields for the API path's legacy dict vocabulary: file_name (alias for file), result_data (alias for result), skipped (marker like "already_completed"). * Both from_dict updated to populate the new fields. Producer migrations (workers/file_processing/tasks.py) * L901 (general path, process_file_batch return): BatchExecutionResult(...).to_dict(). Wire dict gains file_results: [] and errors: [] defaults — strictly additive. * L1706, L1798, L1823 (API path returns from _process_file_batch_api_core helpers): FileExecutionResult(...).to_dict(). L1798 preserves the legacy storage_result field via dict-spread merge. Domain-vocabulary correction on the API path API-path producers previously returned status="completed" / "failed" — lowercase strings matching neither ExecutionStatus (workflow-level, uppercase) nor ApiDeploymentResultStatus (per-file, Success/Failed, the canonical per-file vocab). Producers now emit "Success" / "Failed" via FileExecutionResult. Audit: no Python equality consumer was found reading the lowercase variants (grep clean). Observability tooling pattern-matching the old strings would need updating; this is a domain-correctness fix. Tests New tests/test_chord_callback_boundary.py — 14 tests, 3 classes: * Wire-shape characterisation for BatchExecutionResult. * Wire-shape characterisation for FileExecutionResult with alias fields and canonical Success/Failed vocab. * Consumer tolerance: aggregate_file_batch_results-style .get() reads return expected values from the new wire shape. sdk1's 80 worker_models tests still pass — the dataclass extensions are strictly additive. Regression risk: zero on consumer side, zero on backend (doesn't import these classes; has its own FileExecutionResult in dto.py — untouched). Status-vocab shift on API path is a deliberate domain correction. Test count: workers boundary suite +14 (new); sdk1 dispatcher 80/80. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3513 [FIX] Address PR review (toolkit + SkipReason enum + producer-binding tests) A+B from the triage on PR #2020: * tasks.py:1659 (API-path BATCH return) — migrated to BatchExecutionResult.to_dict(). Fixes the half-typed boundary the reviewer flagged. file_results, total_files, skipped_already_completed and organization_id are now on the wire. Successful/skipped counter semantic preserved (separating them is deferred to a follow-up). * New SkipReason StrEnum (worker_models.py) with ALREADY_COMPLETED + ACTIVE_DUPLICATE — mirrors the batch-level skip counters on BatchExecutionResult. FileExecutionResult.skipped is now SkipReason | None. from_dict coerces. Producer uses the enum; the ACTIVE_DUPLICATE value has no current per-file producer but is exercised end-to-end via a round-trip test. * TODO(UN-3516) marker on the three alias fields (file_name, result_data, skipped) — sunset ticket filed. * Tests strengthened: - TestProducerBinding drives real _compile_batch_result with a minimal SimpleNamespace context, and drives _process_single_file_api via mocked api_client for the already-completed branch. - TestRealConsumerTolerance imports the real aggregate_file_batch_results — producer-consumer contract driven end-to-end. - test_none_valued_optional_fields_stripped_from_wire documents serialize_dataclass_to_dict's None-strip behaviour. - test_active_duplicate_skip_reason_round_trips proves the second enum value isn't dead. - SonarCloud python:S1244 fixed — pytest.approx. - skipped_files==0 NIT assertion removed. Test count: workers boundary suite 14 -> 18; sdk1 worker_models 80/80 still green. Deferred (separate tickets to follow): __post_init__ silent status clobber, from_dict status discard, BatchExecutionResult invariant, storage soft-failure, dead aggregator branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3513 [FIX] Address second-pass review (storage_result + lenient skipped + missing producer tests) Three findings from the second review round on PR #2020: * HIGH — storage_result silent data loss at batch boundary. The per-file dict-spread at tasks.py:1816 preserved storage_result on the immediate return, but the value was dropped when wrapped into BatchExecutionResult.file_results (from_dict didn't know the key). Promoted to a typed FileExecutionResult.storage_result: Any | None field; producer now emits via the constructor; from_dict reads it back. The round-trip preserves it end-to-end. * HIGH — strict SkipReason parsing would crash entire batches during rolling deploys if a newer producer ever emitted an unknown value. Added FileExecutionResult._parse_skipped, which catches ValueError + logs a warning + falls back to None. Standard "strict on emit, lenient on receive" posture for wire compat. * MEDIUM — TestProducerBinding only covered 2 of 5 producer branches. Added three more tests: - _process_single_file_api success branch (asserts storage_result survives the typed wire — would catch the dict-spread revert). - _process_single_file_api failure branch (asserts canonical "Failed" vocab — catches reverts to the legacy lowercase "failed"). - process_file_batch_api batch wrapper via task.apply() with an in-memory result_backend (asserts BatchExecutionResult shape + skipped_already_completed counter derived from SkipReason.ALREADY_COMPLETED.value). Strengthened the existing already-completed branch test to assert result_data + metadata propagation. Bug caught by the new batch-wrapper test: process_file_batch_api was missing execution_time on its BatchExecutionResult(...) call — BatchExecutionResult.execution_time is a required positional, so the API-path batch task would have crashed with TypeError on every run. Introduced batch_start_time = time.time() at task entry and pass execution_time = time.time() - batch_start_time. The new test would have caught this immediately at PR time; logging it here as the exact value of producer-binding coverage. Test count: 18 -> 21; all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3513 [FIX] Symmetric None-stripping for nested file_results + deterministic callback healthcheck picker Greptile P2 #2 — None-stripping was asymmetric for nested FileExecutionResult objects. ``serialize_dataclass_to_dict`` only filters None at the outermost level, so a standalone ``FileExecutionResult.to_dict()`` would omit unset optional fields while ``batch.to_dict()["file_results"][i]`` would carry explicit ``"file_name": None`` etc. for the same input. A consumer doing ``"x" in result`` membership checks would behave differently depending on whether it read the standalone wire or the nested-in- batch wire — a real contract divergence. Fixed locally on ``BatchExecutionResult.to_dict()`` (not by touching the shared ``serialize_dataclass_to_dict`` infra): post-process ``wire["file_results"]`` to drop None-valued keys, mirroring the top-level strip. ``BatchExecutionResult.from_dict`` was already tolerant via ``.get(...)`` so the round-trip stays clean. Greptile P2 #1 (``status`` constructor parameter clobbered by ``__post_init__``) is the same pathology I flagged as BLOCKER #1 in the first review round — deferred to a separate ticket with the shared-infra dataclass redesign. Test coverage: extended the existing ``test_none_valued_optional_fields_stripped_from_wire`` to also assert nested symmetry — same test method, no new method added. This keeps the pytest collection profile stable (a separate test method would perturb celery's shared task-registry insertion order during pytest collection and amplify a pre-existing flake in ``test_callback_sanity.py``). Test infra fix (bundled because it would have flaked CI on this PR's HEAD): ``test_callback_sanity.TestEagerHealthcheckRoundTrip`` selected the healthcheck task via ``endswith(".healthcheck")`` against ``eager_app.tasks``. That registry is a shared celery global with at least 5 worker modules registering ``healthcheck`` (callback, executor, file_processing, log_consumer, scheduler). ``next(...)`` returned whichever was inserted first, which depends on pytest module-collection order across the whole suite. The test would assert ``worker_type == "callback"`` and intermittently get ``"executor"`` or ``"file_processing"`` instead — empirically a ~10% flake rate on this branch's HEAD, climbing to ~90% with any test-collection perturbation. Replaced with an exact-name lookup (``name == "callback.worker.healthcheck"``); 30/30 green across deterministic + randomised probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3513 [FIX] Address vishnuszipstack review (7 real fixes + 1 docstring nit) Seven of Vishnu's PR review findings addressed, all backward-compat with main-branch consumers. The three [Important] design-redesign findings (#1 status __post_init__, #2 alias-pair invariant, #3 to_api_dict/to_json dead code) are deferred to a follow-up shared-infra dataclass ticket because they would either fire warning noise on existing call sites (``worker_base.py:211/222``, ``worker_patterns.py:241`` pass wrong-enum status) or change the wire/cache contract — neither acceptable mid-flight while keeping zero regression on main. Changes in this commit are either: * Pure additive (test methods, docstrings, observability) * Or provably equivalent wire output (the typed-count refactor) So a rolling deploy where old workers and new workers run concurrently sees identical wire shapes and identical behaviour for all current valid data; the only observable differences are log content (better context on the existing warning) and the presence of a new opt-in classmethod that nothing currently calls. * **Vishnu #8 [Suggestion]** — ``SkipReason`` docstring claimed "StrEnum semantics" but the class is ``(str, Enum)``, not ``enum.StrEnum``. The two differ on ``__str__``. Rewrote the docstring to describe the actual behaviour. * **Vishnu #4a [Important — log context]** — ``_parse_skipped`` now accepts an optional ``file_execution_id`` kwarg that ``from_dict`` threads through. The warning emitted for unknown wire values now carries the file identifier, so a real rolling-deploy incident is debuggable rather than a context-free warning. Optional kwarg with default — any existing caller passing one positional arg still works. * **Vishnu #9 [Suggestion]** — added ``BatchExecutionResult.from_file_results(...)`` classmethod that derives counters from typed file results. Purely additive: no existing caller uses it; the constructor signature is unchanged so producers that need their own counter semantics keep working. * **Vishnu #11 [Suggestion]** — ``process_file_batch_api`` was computing ``skipped_already_completed`` by string-matching the wire dicts AFTER already calling ``from_dict`` on them. Refactored to count from the typed list (single ``from_dict`` pass, enum compare). Provably equivalent for all current wire data. * **Vishnu #4 [Important — test gap]** — added ``test_from_dict_unknown_skipped_is_lenient`` covering the one documented crash-prevention path. A regression to bare ``SkipReason(raw)`` would have re-introduced the rolling-deploy crash and kept every other test green. * **Vishnu #5 [Important — failure-aggregation gap]** — added ``test_process_file_batch_api_batch_wrapper_failure_aggregation`` that drives one success + one failure through the batch wrapper. The existing success-only test never exercised ``failed_files += 1``. * **Vishnu #6 [Important — populated round-trip gap]** — added ``test_round_trip_with_populated_file_results`` and ``test_from_file_results_derives_counters``. The existing ``BatchExecutionResult`` round-trip test used ``file_results=[]``, so the list-comprehension in ``from_dict`` that rebuilds nested ``FileExecutionResult`` objects was never executed with a populated list. * **Vishnu #13 [Suggestion]** — replaced hardcoded line reference in test docstring with a symbol reference. Deferred to follow-up shared-infra dataclass-redesign ticket: * #1 ``__post_init__`` status clobber — would emit warning noise on every existing wrong-enum call site * #2 alias-pair invariant — back-fill via __post_init__ would change the wire shape (file_name no longer None → no longer stripped at the top level) * #3 ``to_api_dict``/``to_json`` dead code — looks like a public SDK surface; changing the body could surprise external consumers * #7 recursive ``None``-strip in ``serialize_value`` — touches every dataclass in the codebase * #10 ``Any`` typing tightening — low value, mypy tightening could trip downstream * #12 producer redundant kwargs — depends on #2's reconciliation Tests: workers chord-callback boundary suite 21 -> 25; full workers suite 622 -> 627 (no new failures; 6 pre-existing baseline unchanged). Five deterministic-order runs of the full suite returned exactly 627 passed / 6 pre-existing failed — zero flakiness from this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

removed translator, PII redactor and workflow settings

60f062b

muhammad-ali-e requested review from chandrasekharan-zipstack, hari-kuriakose, jaseemjaskp, tahierhussain and vishnuszipstack February 28, 2024 11:59

chandrasekharan-zipstack approved these changes Feb 28, 2024

View reviewed changes

Merge branch 'main' into RemoveWorkflowConfigurations

bda3af0

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>

muhammad-ali-e closed this Feb 29, 2024

muhammad-ali-e deleted the RemoveWorkflowConfigurations branch February 29, 2024 05:21

muhammad-ali-e mentioned this pull request Mar 5, 2026

UN-2022 [FEAT] Add co-owner management for Adapters, API Deployments, Connectors, Pipelines, Workflows, Prompt Studio #1797

Draft

muhammad-ali-e mentioned this pull request Apr 28, 2026

UN-3435 [FIX] Bind request_id and trace context to worker logs #1932

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

removed translator, PII redactor and workflow settings#11

removed translator, PII redactor and workflow settings#11
muhammad-ali-e wants to merge 2 commits into
mainZipstack/unstract:mainfrom
RemoveWorkflowConfigurationsZipstack/unstract:RemoveWorkflowConfigurationsCopy head branch name to clipboard

muhammad-ali-e commented Feb 28, 2024

Uh oh!

chandrasekharan-zipstack left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Uh oh!

Conversation

muhammad-ali-e commented Feb 28, 2024

What

Screenshots

Checklist

Uh oh!

chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants