feat(core): activity events for recovery, metadata corruption, agent-report#1692
feat(core): activity events for recovery, metadata corruption, agent-report#1692illegalcall wants to merge 1 commit intomainComposioHQ/agent-orchestrator:mainfrom
Conversation
…report
Wires activity events into three forensic-critical paths so RCA can
reconstruct what happened after the fact:
1. recovery subsystem
- recovery.session_failed (MUST) per failed session in runRecovery
- recovery.action_failed (SHOULD) on recoverSession outer catch
Adds "recovery" to ActivityEventSource so `ao events list --source
recovery` reconstructs an `ao recover` invocation timeline.
2. metadata corruption
- metadata.corrupt_detected (MUST) when mutateMetadata renames a
corrupt session-metadata file to .corrupt-{ts}. Includes
data.renamedTo and a 200-char data.contentSample (B11) for forensic
recovery. Previously only console.warn — silent overwrites had no
queryable signal.
3. agent-report apply path
- api.agent_report.transition_rejected (SHOULD)
- api.agent_report.session_not_found (COULD)
Per B22, recovery events fire per session, not per probe step. Per B11,
metadata.corrupt_detected truncates contentSample to 200 chars (full
file would exceed the 16KB sanitizer cap).
Closes #1660
Greptile SummaryThis PR wires forensic activity events into three previously un-instrumented paths — the recovery subsystem, metadata-corruption detection, and the agent-report apply path — so post-incident RCA can reconstruct exactly what
Confidence Score: 4/5Safe to merge; the new event calls are best-effort (never throw), the lock behaviour is correct, and the two small inconsistencies are cosmetic rather than functional. The core instrumentation is well-structured and consistent with the existing activity-events infrastructure. Two minor issues exist: the metadata.corrupt_detected summary string incorrectly claims a rename succeeded in failure cases, and both api.agent_report.* events omit projectId even though every other new event supplies it. Neither breaks any runtime behaviour, but they reduce the accuracy and filterability of the forensic data this PR is specifically designed to provide. packages/core/src/agent-report.ts and packages/core/src/metadata.ts warrant a second look for the projectId gap and the misleading summary string respectively.
|
| Filename | Overview |
|---|---|
| packages/core/src/activity-events.ts | Extends ActivityEventSource and ActivityEventKind unions with five new forensic event types and the "recovery" source; no logic changes to existing recording infrastructure. |
| packages/core/src/metadata.ts | Adds metadata.corrupt_detected event on corrupt-JSON rename path; summary field always claims a rename succeeded even when renameSucceeded is false. |
| packages/core/src/agent-report.ts | Adds api.agent_report.session_not_found and api.agent_report.transition_rejected events; both omit projectId, unlike every other new event in this PR. |
| packages/core/src/recovery/manager.ts | Emits recovery.session_failed once per failed session after executeAction returns {success:false}; correctly captures projectId from the assessment. |
| packages/core/src/recovery/actions.ts | Emits recovery.action_failed in the recoverSession outer catch; correctly extracts errorMessage before returning the failure result. |
| packages/core/src/tests/metadata.test.ts | Adds three new regression tests for metadata.corrupt_detected: emit on bad JSON, 200-char truncation, and no emit for healthy JSON. |
| packages/core/src/tests/recovery-manager.test.ts | New test file verifying recovery.session_failed fires once per failed session and is absent when all sessions succeed. |
Sequence Diagram
sequenceDiagram
participant RM as runRecovery (manager.ts)
participant EA as executeAction (actions.ts)
participant RS as recoverSession (actions.ts)
participant AE as recordActivityEvent
RM->>EA: executeAction(assessment)
EA->>RS: recoverSession(...)
alt recoverSession throws
RS->>AE: recovery.action_failed
RS-->>EA: "{success:false, error}"
end
EA-->>RM: RecoveryResult
alt "result.success === false"
RM->>AE: recovery.session_failed
end
participant AR as applyAgentReport (agent-report.ts)
participant MM as mutateMetadata (metadata.ts)
AR->>MM: readMetadataRaw
alt session not found
AR->>AE: api.agent_report.session_not_found
AR-->>AR: throw Error
end
AR->>MM: mutateMetadata(updater)
MM->>MM: validateAgentReportTransition
alt transition invalid
MM->>AE: api.agent_report.transition_rejected
MM-->>AR: throw Error
end
AR->>MM: mutateMetadata (corrupt path)
alt JSON parse fails
MM->>MM: renameSync to .corrupt-ts
MM->>AE: metadata.corrupt_detected
end
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
packages/core/src/metadata.ts:378-384
The `summary` field is constructed before `renamed` is set, so it always reads "renamed to X" even when the `renameSync` call fails (`renameSucceeded: false`). Since this event is specifically for RCA forensics, a summary that contradicts `data.renameSucceeded` can mislead engineers who glance at the summary without inspecting the full payload.
```suggestion
recordActivityEvent({
projectId: inferredProjectId || undefined,
sessionId,
source: "session-manager",
kind: "metadata.corrupt_detected",
level: "error",
summary: renamed
? `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`
: `Corrupt metadata for session ${sessionId} detected (rename failed)`,
```
### Issue 2 of 3
packages/core/src/agent-report.ts:387-392
Both `api.agent_report.session_not_found` and `api.agent_report.transition_rejected` are emitted without a `projectId`, but every other new event in this PR (`recovery.*`, `metadata.corrupt_detected`) includes one. Since `applyAgentReport` receives `dataDir`, the same `basename(dirname(dataDir))` inference used in `metadata.ts` can populate `projectId` here, making these events filterable in project-scoped RCA queries.
```suggestion
const raw = readMetadataRaw(dataDir, sessionId);
const inferredProjectId = basename(dirname(dataDir)) || undefined;
if (!raw) {
recordActivityEvent({
projectId: inferredProjectId,
sessionId,
source: "api",
kind: "api.agent_report.session_not_found",
```
### Issue 3 of 3
packages/core/src/agent-report.ts:470-473
The `transition_rejected` event is also missing `projectId`, which was just inferred above. Passing it here keeps the two `api.agent_report.*` events consistent and filterable by project.
```suggestion
recordActivityEvent({
projectId: inferredProjectId,
sessionId,
source: "api",
kind: "api.agent_report.transition_rejected",
```
Reviews (1): Last reviewed commit: "feat(core): activity events for recovery..." | Re-trigger Greptile
| recordActivityEvent({ | ||
| projectId: inferredProjectId || undefined, | ||
| sessionId, | ||
| source: "session-manager", | ||
| kind: "metadata.corrupt_detected", | ||
| level: "error", | ||
| summary: `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`, |
There was a problem hiding this comment.
The
summary field is constructed before renamed is set, so it always reads "renamed to X" even when the renameSync call fails (renameSucceeded: false). Since this event is specifically for RCA forensics, a summary that contradicts data.renameSucceeded can mislead engineers who glance at the summary without inspecting the full payload.
| recordActivityEvent({ | |
| projectId: inferredProjectId || undefined, | |
| sessionId, | |
| source: "session-manager", | |
| kind: "metadata.corrupt_detected", | |
| level: "error", | |
| summary: `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`, | |
| recordActivityEvent({ | |
| projectId: inferredProjectId || undefined, | |
| sessionId, | |
| source: "session-manager", | |
| kind: "metadata.corrupt_detected", | |
| level: "error", | |
| summary: renamed | |
| ? `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}` | |
| : `Corrupt metadata for session ${sessionId} detected (rename failed)`, |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/metadata.ts
Line: 378-384
Comment:
The `summary` field is constructed before `renamed` is set, so it always reads "renamed to X" even when the `renameSync` call fails (`renameSucceeded: false`). Since this event is specifically for RCA forensics, a summary that contradicts `data.renameSucceeded` can mislead engineers who glance at the summary without inspecting the full payload.
```suggestion
recordActivityEvent({
projectId: inferredProjectId || undefined,
sessionId,
source: "session-manager",
kind: "metadata.corrupt_detected",
level: "error",
summary: renamed
? `Corrupt metadata for session ${sessionId} renamed to ${basename(corruptPath)}`
: `Corrupt metadata for session ${sessionId} detected (rename failed)`,
```
How can I resolve this? If you propose a fix, please make it concise.| const raw = readMetadataRaw(dataDir, sessionId); | ||
| if (!raw) { | ||
| recordActivityEvent({ | ||
| sessionId, | ||
| source: "api", | ||
| kind: "api.agent_report.session_not_found", |
There was a problem hiding this comment.
Both
api.agent_report.session_not_found and api.agent_report.transition_rejected are emitted without a projectId, but every other new event in this PR (recovery.*, metadata.corrupt_detected) includes one. Since applyAgentReport receives dataDir, the same basename(dirname(dataDir)) inference used in metadata.ts can populate projectId here, making these events filterable in project-scoped RCA queries.
| const raw = readMetadataRaw(dataDir, sessionId); | |
| if (!raw) { | |
| recordActivityEvent({ | |
| sessionId, | |
| source: "api", | |
| kind: "api.agent_report.session_not_found", | |
| const raw = readMetadataRaw(dataDir, sessionId); | |
| const inferredProjectId = basename(dirname(dataDir)) || undefined; | |
| if (!raw) { | |
| recordActivityEvent({ | |
| projectId: inferredProjectId, | |
| sessionId, | |
| source: "api", | |
| kind: "api.agent_report.session_not_found", |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/agent-report.ts
Line: 387-392
Comment:
Both `api.agent_report.session_not_found` and `api.agent_report.transition_rejected` are emitted without a `projectId`, but every other new event in this PR (`recovery.*`, `metadata.corrupt_detected`) includes one. Since `applyAgentReport` receives `dataDir`, the same `basename(dirname(dataDir))` inference used in `metadata.ts` can populate `projectId` here, making these events filterable in project-scoped RCA queries.
```suggestion
const raw = readMetadataRaw(dataDir, sessionId);
const inferredProjectId = basename(dirname(dataDir)) || undefined;
if (!raw) {
recordActivityEvent({
projectId: inferredProjectId,
sessionId,
source: "api",
kind: "api.agent_report.session_not_found",
```
How can I resolve this? If you propose a fix, please make it concise.| recordActivityEvent({ | ||
| sessionId, | ||
| source: "api", | ||
| kind: "api.agent_report.transition_rejected", |
There was a problem hiding this comment.
The
transition_rejected event is also missing projectId, which was just inferred above. Passing it here keeps the two api.agent_report.* events consistent and filterable by project.
| recordActivityEvent({ | |
| sessionId, | |
| source: "api", | |
| kind: "api.agent_report.transition_rejected", | |
| recordActivityEvent({ | |
| projectId: inferredProjectId, | |
| sessionId, | |
| source: "api", | |
| kind: "api.agent_report.transition_rejected", |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/src/agent-report.ts
Line: 470-473
Comment:
The `transition_rejected` event is also missing `projectId`, which was just inferred above. Passing it here keeps the two `api.agent_report.*` events consistent and filterable by project.
```suggestion
recordActivityEvent({
projectId: inferredProjectId,
sessionId,
source: "api",
kind: "api.agent_report.transition_rejected",
```
How can I resolve this? If you propose a fix, please make it concise.|
@harshitsinghbhandari — flagging you as a likely reviewer/owner here based on
|
Summary
Wires activity events into three forensic-critical paths so RCA can reconstruct what happened after the fact (sub-issue of #1511):
recovery.session_failed(MUST) per failed session,recovery.action_failed(SHOULD) onrecoverSessionouter catch. Adds"recovery"toActivityEventSource.metadata.corrupt_detected(MUST) whenmutateMetadataside-renames a corrupt session-metadata file to.corrupt-{ts}. Includesdata.renamedToand a 200-chardata.contentSampleper B11.api.agent_report.transition_rejected(SHOULD) andapi.agent_report.session_not_found(COULD).After this lands, RCA can answer:
ao recoveractually fix anything? Which sessions did it skip?applyAgentReportreject a transition (and why)?Closes #1660.
Invariants preserved
recovery.session_failedper failed session inrunRecovery's loop).metadata.corrupt_detectedtruncatesdata.contentSampleto 200 chars — full file content would be dropped by the 16KB sanitizer cap.data—errorMessageonly.Out of scope (per issue)
validator.tsper-probe catches (deliberate uncertainty handling).cleanupSessionruntime/workspace destroy silent catches (best-effort).deleteMetadataunlink catch (expected race).Test plan
metadata.test.tsregression: emitsmetadata.corrupt_detectedwithrenamedTo, truncatescontentSampleto 200 chars, no emit for healthy JSON.recovery-manager.test.tsregression: emits onerecovery.session_failedper failed session, no emit when all succeed.pnpm --filter @aoagents/ao-core typecheckpasses.🤖 Generated with Claude Code