Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(core): activity events for recovery, metadata corruption, agent-report#1692

Open
illegalcall wants to merge 1 commit intomainComposioHQ/agent-orchestrator:mainfrom
feat/issue-1660ComposioHQ/agent-orchestrator:feat/issue-1660Copy head branch name to clipboard
Open

feat(core): activity events for recovery, metadata corruption, agent-report#1692
illegalcall wants to merge 1 commit intomainComposioHQ/agent-orchestrator:mainfrom
feat/issue-1660ComposioHQ/agent-orchestrator:feat/issue-1660Copy head branch name to clipboard

Conversation

@illegalcall
Copy link
Copy Markdown
Collaborator

Summary

Wires activity events into three forensic-critical paths so RCA can reconstruct what happened after the fact (sub-issue of #1511):

  1. Recovery subsystemrecovery.session_failed (MUST) per failed session, recovery.action_failed (SHOULD) on recoverSession outer catch. Adds "recovery" to ActivityEventSource.
  2. Metadata corruptionmetadata.corrupt_detected (MUST) when mutateMetadata side-renames a corrupt session-metadata file to .corrupt-{ts}. Includes data.renamedTo and a 200-char data.contentSample per B11.
  3. Agent-report apply pathapi.agent_report.transition_rejected (SHOULD) and api.agent_report.session_not_found (COULD).

After this lands, RCA can answer:

  • Did ao recover actually fix anything? Which sessions did it skip?
  • Has AO ever silently renamed a corrupt metadata file? When?
  • Did applyAgentReport reject a transition (and why)?

Closes #1660.

Invariants preserved

  • B22: recovery events fire per session, not per probe step (one recovery.session_failed per failed session in runRecovery's loop).
  • B11: metadata.corrupt_detected truncates data.contentSample to 200 chars — full file content would be dropped by the 16KB sanitizer cap.
  • No raw error stacks in dataerrorMessage only.

Out of scope (per issue)

  • validator.ts per-probe catches (deliberate uncertainty handling).
  • cleanupSession runtime/workspace destroy silent catches (best-effort).
  • deleteMetadata unlink catch (expected race).

Test plan

  • metadata.test.ts regression: emits metadata.corrupt_detected with renamedTo, truncates contentSample to 200 chars, no emit for healthy JSON.
  • recovery-manager.test.ts regression: emits one recovery.session_failed per failed session, no emit when all succeed.
  • pnpm --filter @aoagents/ao-core typecheck passes.
  • All existing core tests pass (the two pre-existing OpenCode session-manager timeouts on main are unrelated).

🤖 Generated with Claude Code

…report

Wires activity events into three forensic-critical paths so RCA can
reconstruct what happened after the fact:

1. recovery subsystem
   - recovery.session_failed (MUST) per failed session in runRecovery
   - recovery.action_failed (SHOULD) on recoverSession outer catch
   Adds "recovery" to ActivityEventSource so `ao events list --source
   recovery` reconstructs an `ao recover` invocation timeline.

2. metadata corruption
   - metadata.corrupt_detected (MUST) when mutateMetadata renames a
     corrupt session-metadata file to .corrupt-{ts}. Includes
     data.renamedTo and a 200-char data.contentSample (B11) for forensic
     recovery. Previously only console.warn — silent overwrites had no
     queryable signal.

3. agent-report apply path
   - api.agent_report.transition_rejected (SHOULD)
   - api.agent_report.session_not_found (COULD)

Per B22, recovery events fire per session, not per probe step. Per B11,
metadata.corrupt_detected truncates contentSample to 200 chars (full
file would exceed the 16KB sanitizer cap).

Closes #1660
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR wires forensic activity events into three previously un-instrumented paths — the recovery subsystem, metadata-corruption detection, and the agent-report apply path — so post-incident RCA can reconstruct exactly what ao recover did, when a corrupt metadata file was silently renamed, and why an agent-report transition was rejected.

  • Recovery (recovery/manager.ts, recovery/actions.ts): emits recovery.session_failed once per failed session in the runRecovery loop (B22) and recovery.action_failed in the recoverSession outer catch; both events carry projectId from the assessment.
  • Metadata corruption (metadata.ts): emits metadata.corrupt_detected after a corrupt-JSON rename attempt, with a 200-char contentSample (B11) and a renameSucceeded flag; the summary string is built before the rename result is known, so it always reads "renamed to X" even when the rename failed.
  • Agent-report (agent-report.ts): emits api.agent_report.session_not_found and api.agent_report.transition_rejected; unlike every other new event, neither includes a projectId, making project-scoped RCA queries incomplete for these paths.

Confidence Score: 4/5

Safe to merge; the new event calls are best-effort (never throw), the lock behaviour is correct, and the two small inconsistencies are cosmetic rather than functional.

The core instrumentation is well-structured and consistent with the existing activity-events infrastructure. Two minor issues exist: the metadata.corrupt_detected summary string incorrectly claims a rename succeeded in failure cases, and both api.agent_report.* events omit projectId even though every other new event supplies it. Neither breaks any runtime behaviour, but they reduce the accuracy and filterability of the forensic data this PR is specifically designed to provide.

packages/core/src/agent-report.ts and packages/core/src/metadata.ts warrant a second look for the projectId gap and the misleading summary string respectively.

Important Files Changed

Filename Overview
packages/core/src/activity-events.ts Extends ActivityEventSource and ActivityEventKind unions with five new forensic event types and the "recovery" source; no logic changes to existing recording infrastructure.
packages/core/src/metadata.ts Adds metadata.corrupt_detected event on corrupt-JSON rename path; summary field always claims a rename succeeded even when renameSucceeded is false.
packages/core/src/agent-report.ts Adds api.agent_report.session_not_found and api.agent_report.transition_rejected events; both omit projectId, unlike every other new event in this PR.
packages/core/src/recovery/manager.ts Emits recovery.session_failed once per failed session after executeAction returns {success:false}; correctly captures projectId from the assessment.
packages/core/src/recovery/actions.ts Emits recovery.action_failed in the recoverSession outer catch; correctly extracts errorMessage before returning the failure result.
packages/core/src/tests/metadata.test.ts Adds three new regression tests for metadata.corrupt_detected: emit on bad JSON, 200-char truncation, and no emit for healthy JSON.
packages/core/src/tests/recovery-manager.test.ts New test file verifying recovery.session_failed fires once per failed session and is absent when all sessions succeed.

Sequence Diagram

sequenceDiagram
    participant RM as runRecovery (manager.ts)
    participant EA as executeAction (actions.ts)
    participant RS as recoverSession (actions.ts)
    participant AE as recordActivityEvent

    RM->>EA: executeAction(assessment)
    EA->>RS: recoverSession(...)
    alt recoverSession throws
        RS->>AE: recovery.action_failed
        RS-->>EA: "{success:false, error}"
    end
    EA-->>RM: RecoveryResult

    alt "result.success === false"
        RM->>AE: recovery.session_failed
    end

    participant AR as applyAgentReport (agent-report.ts)
    participant MM as mutateMetadata (metadata.ts)

    AR->>MM: readMetadataRaw
    alt session not found
        AR->>AE: api.agent_report.session_not_found
        AR-->>AR: throw Error
    end
    AR->>MM: mutateMetadata(updater)
    MM->>MM: validateAgentReportTransition
    alt transition invalid
        MM->>AE: api.agent_report.transition_rejected
        MM-->>AR: throw Error
    end

    AR->>MM: mutateMetadata (corrupt path)
    alt JSON parse fails
        MM->>MM: renameSync to .corrupt-ts
        MM->>AE: metadata.corrupt_detected
    end
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/core/src/metadata.ts:378-384
The `summary` field is constructed before `renamed` is set, so it always reads "renamed to X" even when the `renameSync` call fails (`renameSucceeded: false`). Since this event is specifically for RCA forensics, a summary that contradicts `data.renameSucceeded` can mislead engineers who glance at the summary without inspecting the full payload.

```suggestion
          recordActivityEvent({
            projectId: inferredProjectId || undefined,
            sessionId,
            source: "session-manager",
            kind: "metadata.corrupt_detected",
            level: "error",
            summary: renamed
              ? `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`
              : `Corrupt metadata for session ${sessionId} detected (rename failed)`,
```

### Issue 2 of 3
packages/core/src/agent-report.ts:387-392
Both `api.agent_report.session_not_found` and `api.agent_report.transition_rejected` are emitted without a `projectId`, but every other new event in this PR (`recovery.*`, `metadata.corrupt_detected`) includes one. Since `applyAgentReport` receives `dataDir`, the same `basename(dirname(dataDir))` inference used in `metadata.ts` can populate `projectId` here, making these events filterable in project-scoped RCA queries.

```suggestion
  const raw = readMetadataRaw(dataDir, sessionId);
  const inferredProjectId = basename(dirname(dataDir)) || undefined;
  if (!raw) {
    recordActivityEvent({
      projectId: inferredProjectId,
      sessionId,
      source: "api",
      kind: "api.agent_report.session_not_found",
```

### Issue 3 of 3
packages/core/src/agent-report.ts:470-473
The `transition_rejected` event is also missing `projectId`, which was just inferred above. Passing it here keeps the two `api.agent_report.*` events consistent and filterable by project.

```suggestion
      recordActivityEvent({
        projectId: inferredProjectId,
        sessionId,
        source: "api",
        kind: "api.agent_report.transition_rejected",
```

Reviews (1): Last reviewed commit: "feat(core): activity events for recovery..." | Re-trigger Greptile

Comment on lines +378 to +384
recordActivityEvent({
projectId: inferredProjectId || undefined,
sessionId,
source: "session-manager",
kind: "metadata.corrupt_detected",
level: "error",
summary: `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The summary field is constructed before renamed is set, so it always reads "renamed to X" even when the renameSync call fails (renameSucceeded: false). Since this event is specifically for RCA forensics, a summary that contradicts data.renameSucceeded can mislead engineers who glance at the summary without inspecting the full payload.

Suggested change
recordActivityEvent({
projectId: inferredProjectId || undefined,
sessionId,
source: "session-manager",
kind: "metadata.corrupt_detected",
level: "error",
summary: `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`,
recordActivityEvent({
projectId: inferredProjectId || undefined,
sessionId,
source: "session-manager",
kind: "metadata.corrupt_detected",
level: "error",
summary: renamed
? `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`
: `Corrupt metadata for session ${sessionId} detected (rename failed)`,
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/metadata.ts
Line: 378-384

Comment:
The `summary` field is constructed before `renamed` is set, so it always reads "renamed to X" even when the `renameSync` call fails (`renameSucceeded: false`). Since this event is specifically for RCA forensics, a summary that contradicts `data.renameSucceeded` can mislead engineers who glance at the summary without inspecting the full payload.

```suggestion
          recordActivityEvent({
            projectId: inferredProjectId || undefined,
            sessionId,
            source: "session-manager",
            kind: "metadata.corrupt_detected",
            level: "error",
            summary: renamed
              ? `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`
              : `Corrupt metadata for session ${sessionId} detected (rename failed)`,
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 387 to +392
const raw = readMetadataRaw(dataDir, sessionId);
if (!raw) {
recordActivityEvent({
sessionId,
source: "api",
kind: "api.agent_report.session_not_found",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Both api.agent_report.session_not_found and api.agent_report.transition_rejected are emitted without a projectId, but every other new event in this PR (recovery.*, metadata.corrupt_detected) includes one. Since applyAgentReport receives dataDir, the same basename(dirname(dataDir)) inference used in metadata.ts can populate projectId here, making these events filterable in project-scoped RCA queries.

Suggested change
const raw = readMetadataRaw(dataDir, sessionId);
if (!raw) {
recordActivityEvent({
sessionId,
source: "api",
kind: "api.agent_report.session_not_found",
const raw = readMetadataRaw(dataDir, sessionId);
const inferredProjectId = basename(dirname(dataDir)) || undefined;
if (!raw) {
recordActivityEvent({
projectId: inferredProjectId,
sessionId,
source: "api",
kind: "api.agent_report.session_not_found",
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/agent-report.ts
Line: 387-392

Comment:
Both `api.agent_report.session_not_found` and `api.agent_report.transition_rejected` are emitted without a `projectId`, but every other new event in this PR (`recovery.*`, `metadata.corrupt_detected`) includes one. Since `applyAgentReport` receives `dataDir`, the same `basename(dirname(dataDir))` inference used in `metadata.ts` can populate `projectId` here, making these events filterable in project-scoped RCA queries.

```suggestion
  const raw = readMetadataRaw(dataDir, sessionId);
  const inferredProjectId = basename(dirname(dataDir)) || undefined;
  if (!raw) {
    recordActivityEvent({
      projectId: inferredProjectId,
      sessionId,
      source: "api",
      kind: "api.agent_report.session_not_found",
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +470 to +473
recordActivityEvent({
sessionId,
source: "api",
kind: "api.agent_report.transition_rejected",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The transition_rejected event is also missing projectId, which was just inferred above. Passing it here keeps the two api.agent_report.* events consistent and filterable by project.

Suggested change
recordActivityEvent({
sessionId,
source: "api",
kind: "api.agent_report.transition_rejected",
recordActivityEvent({
projectId: inferredProjectId,
sessionId,
source: "api",
kind: "api.agent_report.transition_rejected",
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/agent-report.ts
Line: 470-473

Comment:
The `transition_rejected` event is also missing `projectId`, which was just inferred above. Passing it here keeps the two `api.agent_report.*` events consistent and filterable by project.

```suggestion
      recordActivityEvent({
        projectId: inferredProjectId,
        sessionId,
        source: "api",
        kind: "api.agent_report.transition_rejected",
```

How can I resolve this? If you propose a fix, please make it concise.

@illegalcall
Copy link
Copy Markdown
Collaborator Author

@harshitsinghbhandari — flagging you as a likely reviewer/owner here based on git blame of the files this PR touches:

recovery/manager.ts is originally @harsh-batheja's (#362), but you've been the active maintainer across the other three files. Would you be able to take this one over, or at least review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Activity events: Recovery, metadata-corruption detection, agent-report (sub-issue of #1511)

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.