feat(cli): wire activity events into CLI commands and supervisor lifecycle#1698
feat(cli): wire activity events into CLI commands and supervisor lifecycle#1698illegalcall wants to merge 1 commit intomainComposioHQ/agent-orchestrator:mainfrom
Conversation
…cycle (#1654) Adds "cli" to ActivityEventSource and emits ~30 activity events across the CLI surface so `ao events list --source cli` can answer RCA questions like: - Did AO start cleanly? When? (cli.start_invoked / cli.start_failed) - Was AO shut down gracefully or did it crash? (cli.shutdown_signal / cli.shutdown_completed / cli.shutdown_force_exit / cli.stale_running_pruned) - Did ao spawn / ao update / ao stop / ao migrate-storage fail and why? - Did the auto-restore prompt actually restore sessions? Instrumented files: - packages/core/src/activity-events.ts (cli source) - packages/cli/src/lib/shutdown.ts (signal/completed/failed/force_exit/session_kill_failed) - packages/cli/src/commands/start.ts (start_invoked/start_failed/restore_*/stop_*/daemon_*/last_stop_* /config_migrated) - packages/cli/src/commands/spawn.ts (spawn_invoked/spawn_failed) - packages/cli/src/commands/update.ts (update_invoked/update_failed) - packages/cli/src/commands/setup.ts (setup_failed) - packages/cli/src/commands/migrate-storage.ts (migration_completed/migration_failed) - packages/cli/src/commands/project.ts (project_register_failed) - packages/cli/src/lib/resolve-project.ts (project_resolve_failed/config_recovered/config_recovery_failed) - packages/cli/src/lib/running-state.ts (lock_timeout/stale_running_pruned) - packages/cli/src/lib/credential-resolver.ts (credential_load_failed) All emits are sync, never wrapped in try/catch (recordActivityEvent never throws), and put raw error text in data.errorMessage (not summary, which is FTS-indexed and not credential-sanitized). Tests: - 16 new instrumentation tests across shutdown, migrate-storage, update, resolve-project, and start/stop action paths covering MUST emits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR wires ~30
Confidence Score: 4/5The change is purely additive instrumentation and does not alter session, config, or process control logic; safe to merge after addressing the setup.ts false-positive. The cli.setup_failed false-positive in setup.ts is the sharpest issue: ao setup openclaw prints "Setup complete!" while simultaneously writing a failure event, so automated monitoring on that event kind would alert on successful runs. The cli.start_invoked naming and single-spawn preflight gap are observability imprecisions that do not break command behavior but make the event stream harder to interpret. setup.ts (false-positive failure event), start.ts (event kind vs. placement mismatch), spawn.ts (single-spawn preflight gap), shutdown.ts (unregister skip if writeLastStop throws)
|
| Filename | Overview |
|---|---|
| packages/cli/src/commands/setup.ts | Adds cli.setup_failed events. One emission fires on a partial-success path (openclaw.json write fallback) while the command continues and prints "Setup complete!", creating a false-positive failure event. |
| packages/cli/src/commands/start.ts | Adds cli.start_invoked, cli.start_failed (with reason), cli.restore_, cli.last_stop_read_failed, cli.stop_ events. The cli.start_invoked name is misleading — it fires on successful completion, not at invocation. |
| packages/cli/src/commands/spawn.ts | Adds cli.spawn_invoked and cli.spawn_failed. Single-spawn preflight failures have no coverage — inconsistent with batch-spawn which emits cli.spawn_failed on preflight errors. |
| packages/cli/src/lib/shutdown.ts | Adds shutdown signal, force-exit, session-kill-failed, completed, and failed events. If writeLastStop throws, unregister() is skipped and running.json is left stale. |
| packages/core/src/activity-events.ts | Adds "cli" to ActivityEventSource union. Minimal, correct change. |
| packages/cli/src/lib/running-state.ts | Adds cli.lock_timeout and cli.stale_running_pruned. Both are sync-safe and follow the never-throws invariant. |
| packages/cli/src/lib/resolve-project.ts | Adds cli.project_resolve_failed and cli.config_recovered / cli.config_recovery_failed events. All follow the invariants. |
| packages/cli/src/lib/credential-resolver.ts | Adds cli.credential_load_failed in the catch block with error in data.errorMessage. Correct. |
| packages/cli/src/commands/update.ts | Adds cli.update_invoked and cli.update_failed for git and npm paths. Covers all non-zero exit paths. |
| packages/cli/src/commands/migrate-storage.ts | Adds cli.migration_completed and cli.migration_failed for rollback and forward migration. Correct. |
| packages/cli/src/commands/project.ts | Adds cli.project_register_failed for three distinct failure reasons. Well-structured. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[ao start invoked] --> B{daemon already running?}
B -- yes, no arg --> C[user chooses quit/open/add/restart/new]
C -- quit/open/add --> D[process.exit — NO event emitted]
C -- restart --> E[cli.daemon_restart → kill daemon → continue]
B -- no --> F[resolveOrCreateProject]
F -- URL clone fails --> G[cli.project_resolve_failed]
F -- no config → auto-create --> H[cli.config_recovered]
F -- registerFlatConfig null --> I[cli.config_recovery_failed]
F -- success --> J[runStartup]
J -- orchestrator setup fails --> K[cli.start_failed reason:orchestrator_setup]
J -- supervisor start fails --> L[cli.start_failed reason:supervisor_start]
J -- restore sessions → per-session fail --> M[cli.restore_session_failed]
J -- success → register running.json --> P[cli.start_invoked]
A -- outer catch --> Q[cli.start_failed reason:outer]
R[SIGINT/SIGTERM] --> S[cli.shutdown_signal]
S --> T{sessions killed}
T -- writeLastStop throws --> V[cli.shutdown_failed — unregister skipped]
T -- success → unregister --> W[cli.shutdown_completed]
S -- 10s timeout --> X[cli.shutdown_force_exit → process.exit]
Comments Outside Diff (2)
-
packages/cli/src/commands/spawn.ts, line 363-373 (link)Single-spawn preflight failures emit no activity event
In
registerSpawn, ifrunSpawnPreflightthrows the outercatchblock logs the error and callsprocess.exit(1)without emitting any activity event.cli.spawn_invokedis only emitted insidespawnSession(never reached on preflight failure), so preflight errors are invisible in the event stream. In contrast,registerBatchSpawndoes emitcli.spawn_failedwhen preflight fails for a group. The inconsistency meansao events list --source cliis silent for single-spawn preflight failures.Prompt To Fix With AI
This is a comment left during a code review. Path: packages/cli/src/commands/spawn.ts Line: 363-373 Comment: **Single-spawn preflight failures emit no activity event** In `registerSpawn`, if `runSpawnPreflight` throws the outer `catch` block logs the error and calls `process.exit(1)` without emitting any activity event. `cli.spawn_invoked` is only emitted inside `spawnSession` (never reached on preflight failure), so preflight errors are invisible in the event stream. In contrast, `registerBatchSpawn` does emit `cli.spawn_failed` when preflight fails for a group. The inconsistency means `ao events list --source cli` is silent for single-spawn preflight failures. How can I resolve this? If you propose a fix, please make it concise.
-
packages/cli/src/lib/shutdown.ts, line 127-135 (link)unregister()is skipped ifwriteLastStopthrowswriteLastStopandunregister()share the same outertryblock. IfwriteLastStopfails — for example because the last-stop lock times out whileao stopis concurrently holding it — the exception jumps to thecatch, emitscli.shutdown_failed, andunregister()is never called. Therunning.jsonis left stale until the nextao startauto-prunes it. WrappingwriteLastStopin its owntry/catch(or usingtry/finallyaroundunregister()) would ensure cleanup always runs.Prompt To Fix With AI
This is a comment left during a code review. Path: packages/cli/src/lib/shutdown.ts Line: 127-135 Comment: **`unregister()` is skipped if `writeLastStop` throws** `writeLastStop` and `unregister()` share the same outer `try` block. If `writeLastStop` fails — for example because the last-stop lock times out while `ao stop` is concurrently holding it — the exception jumps to the `catch`, emits `cli.shutdown_failed`, and `unregister()` is never called. The `running.json` is left stale until the next `ao start` auto-prunes it. Wrapping `writeLastStop` in its own `try/catch` (or using `try/finally` around `unregister()`) would ensure cleanup always runs. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
packages/cli/src/commands/setup.ts:592-600
**`cli.setup_failed` fires on a successful-completion path**
When `writeOpenClawJsonConfig` returns `false`, the code emits `cli.setup_failed` — but then continues to write the shell export, print instructions, and print "Setup complete!". The user sees a success message while the event stream records a failure. Any `ao events list --source cli` query looking for `cli.setup_failed` will flag successful `ao setup openclaw` runs where only the OpenClaw-side JSON write was skipped (a recoverable, documented fallback). A distinct kind such as `cli.setup_partial` or adjusting to a `warn`-level sub-event would prevent the false positive.
### Issue 2 of 4
packages/cli/src/commands/start.ts:1635-1646
**`cli.start_invoked` semantics describe completion, not invocation**
The event is emitted *after* `runStartup()` succeeds and `register()` writes running.json — its own summary says `"ao start completed"`. As a result it only ever fires on the happy path and never appears on early exits or any failure path. A developer querying `ao events list --source cli` expecting a start-of-command marker paired with `cli.start_failed` will find no invocation marker before potential failures, only a completion record after success. Renaming to `cli.start_completed` would align the kind with the actual semantics.
### Issue 3 of 4
packages/cli/src/commands/spawn.ts:363-373
**Single-spawn preflight failures emit no activity event**
In `registerSpawn`, if `runSpawnPreflight` throws the outer `catch` block logs the error and calls `process.exit(1)` without emitting any activity event. `cli.spawn_invoked` is only emitted inside `spawnSession` (never reached on preflight failure), so preflight errors are invisible in the event stream. In contrast, `registerBatchSpawn` does emit `cli.spawn_failed` when preflight fails for a group. The inconsistency means `ao events list --source cli` is silent for single-spawn preflight failures.
### Issue 4 of 4
packages/cli/src/lib/shutdown.ts:127-135
**`unregister()` is skipped if `writeLastStop` throws**
`writeLastStop` and `unregister()` share the same outer `try` block. If `writeLastStop` fails — for example because the last-stop lock times out while `ao stop` is concurrently holding it — the exception jumps to the `catch`, emits `cli.shutdown_failed`, and `unregister()` is never called. The `running.json` is left stale until the next `ao start` auto-prunes it. Wrapping `writeLastStop` in its own `try/catch` (or using `try/finally` around `unregister()`) would ensure cleanup always runs.
Reviews (1): Last reviewed commit: "feat(cli): wire activity events into CLI..." | Re-trigger Greptile
| } else if (!openclawConfigWritten) { | ||
| recordActivityEvent({ | ||
| source: "cli", | ||
| kind: "cli.setup_failed", | ||
| level: "warn", | ||
| summary: `failed to write ~/.openclaw/openclaw.json`, | ||
| data: { reason: "openclaw_json_write_failed" }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
cli.setup_failed fires on a successful-completion path
When writeOpenClawJsonConfig returns false, the code emits cli.setup_failed — but then continues to write the shell export, print instructions, and print "Setup complete!". The user sees a success message while the event stream records a failure. Any ao events list --source cli query looking for cli.setup_failed will flag successful ao setup openclaw runs where only the OpenClaw-side JSON write was skipped (a recoverable, documented fallback). A distinct kind such as cli.setup_partial or adjusting to a warn-level sub-event would prevent the false positive.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli/src/commands/setup.ts
Line: 592-600
Comment:
**`cli.setup_failed` fires on a successful-completion path**
When `writeOpenClawJsonConfig` returns `false`, the code emits `cli.setup_failed` — but then continues to write the shell export, print instructions, and print "Setup complete!". The user sees a success message while the event stream records a failure. Any `ao events list --source cli` query looking for `cli.setup_failed` will flag successful `ao setup openclaw` runs where only the OpenClaw-side JSON write was skipped (a recoverable, documented fallback). A distinct kind such as `cli.setup_partial` or adjusting to a `warn`-level sub-event would prevent the false positive.
How can I resolve this? If you propose a fix, please make it concise.| recordActivityEvent({ | ||
| projectId, | ||
| source: "cli", | ||
| kind: "cli.start_invoked", | ||
| level: "info", | ||
| summary: `ao start completed: registered on port ${actualPort}`, | ||
| data: { | ||
| pid: process.pid, | ||
| port: actualPort, | ||
| projects: listLifecycleWorkers(), | ||
| }, | ||
| }); |
There was a problem hiding this comment.
cli.start_invoked semantics describe completion, not invocation
The event is emitted after runStartup() succeeds and register() writes running.json — its own summary says "ao start completed". As a result it only ever fires on the happy path and never appears on early exits or any failure path. A developer querying ao events list --source cli expecting a start-of-command marker paired with cli.start_failed will find no invocation marker before potential failures, only a completion record after success. Renaming to cli.start_completed would align the kind with the actual semantics.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli/src/commands/start.ts
Line: 1635-1646
Comment:
**`cli.start_invoked` semantics describe completion, not invocation**
The event is emitted *after* `runStartup()` succeeds and `register()` writes running.json — its own summary says `"ao start completed"`. As a result it only ever fires on the happy path and never appears on early exits or any failure path. A developer querying `ao events list --source cli` expecting a start-of-command marker paired with `cli.start_failed` will find no invocation marker before potential failures, only a completion record after success. Renaming to `cli.start_completed` would align the kind with the actual semantics.
How can I resolve this? If you propose a fix, please make it concise.
Closes #1654
Summary
"cli"to theActivityEventSourceunion and emits ~30 activity events acrossao start,ao stop,ao spawn,ao update,ao setup,ao migrate-storage, plus shared CLI helpers (shutdown, resolve-project, running-state, credential-resolver, project).ao events list --source cli --since 1hnow answers RCA questions like "did AO start cleanly?", "was AO killed gracefully or did it crash?", and "didao spawn/ao stopfail and why?"Event coverage
Startup / shutdown / restore (
start.ts,shutdown.ts):cli.start_invoked,cli.start_failed(withreason: orchestrator_setup | supervisor_start | outer),cli.restore_started,cli.restore_session_failed,cli.restore_completed,cli.last_stop_read_failed,cli.shutdown_signal,cli.shutdown_force_exit,cli.shutdown_session_kill_failed,cli.shutdown_completed,cli.shutdown_failed.ao stop(start.ts):cli.stop_invoked,cli.stop_session_failed,cli.last_stop_written,cli.daemon_killed,cli.stop_failed.Other commands:
cli.spawn_invoked,cli.spawn_failed(incl. batch-spawn preflight),cli.update_invoked,cli.update_failed(git + npm),cli.setup_failed,cli.migration_completed,cli.migration_failed,cli.project_register_failed.Project resolution / config recovery (
resolve-project.ts,running-state.ts, etc.):cli.project_resolve_failed,cli.config_recovered,cli.config_recovery_failed,cli.config_migrated,cli.daemon_restart,cli.lock_timeout,cli.stale_running_pruned,cli.credential_load_failed.Invariants preserved (per CLAUDE.md / PR #1620)
recordActivityEventis sync — neverawait-ed.data.errorMessage, notsummary(which is FTS-indexed and not credential-sanitized).process.exit.Test plan
pnpm typecheckpassespnpm --filter @aoagents/ao-core test(activity-events + lifecycle-manager tests pass; pre-existing OpenCode subprocess flakes are unrelated)pnpm --filter @aoagents/ao-cli test(615/616 passing, 1 pre-existing flake inao-doctor.shshell script that passes in isolation)🤖 Generated with Claude Code