Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(cli): wire activity events into CLI commands and supervisor lifecycle#1698

Open
illegalcall wants to merge 1 commit intomainComposioHQ/agent-orchestrator:mainfrom
feat/issue-1654ComposioHQ/agent-orchestrator:feat/issue-1654Copy head branch name to clipboard
Open

feat(cli): wire activity events into CLI commands and supervisor lifecycle#1698
illegalcall wants to merge 1 commit intomainComposioHQ/agent-orchestrator:mainfrom
feat/issue-1654ComposioHQ/agent-orchestrator:feat/issue-1654Copy head branch name to clipboard

Conversation

@illegalcall
Copy link
Copy Markdown
Collaborator

Closes #1654

Summary

  • Adds "cli" to the ActivityEventSource union and emits ~30 activity events across ao start, ao stop, ao spawn, ao update, ao setup, ao migrate-storage, plus shared CLI helpers (shutdown, resolve-project, running-state, credential-resolver, project).
  • ao events list --source cli --since 1h now answers RCA questions like "did AO start cleanly?", "was AO killed gracefully or did it crash?", and "did ao spawn/ao stop fail and why?"
  • 16 new instrumentation tests covering MUST emits.

Event coverage

Startup / shutdown / restore (start.ts, shutdown.ts):
cli.start_invoked, cli.start_failed (with reason: orchestrator_setup | supervisor_start | outer), cli.restore_started, cli.restore_session_failed, cli.restore_completed, cli.last_stop_read_failed, cli.shutdown_signal, cli.shutdown_force_exit, cli.shutdown_session_kill_failed, cli.shutdown_completed, cli.shutdown_failed.

ao stop (start.ts):
cli.stop_invoked, cli.stop_session_failed, cli.last_stop_written, cli.daemon_killed, cli.stop_failed.

Other commands:
cli.spawn_invoked, cli.spawn_failed (incl. batch-spawn preflight), cli.update_invoked, cli.update_failed (git + npm), cli.setup_failed, cli.migration_completed, cli.migration_failed, cli.project_register_failed.

Project resolution / config recovery (resolve-project.ts, running-state.ts, etc.):
cli.project_resolve_failed, cli.config_recovered, cli.config_recovery_failed, cli.config_migrated, cli.daemon_restart, cli.lock_timeout, cli.stale_running_pruned, cli.credential_load_failed.

Invariants preserved (per CLAUDE.md / PR #1620)

  • recordActivityEvent is sync — never await-ed.
  • Never wrapped in try/catch (it never throws).
  • Raw error text goes into data.errorMessage, not summary (which is FTS-indexed and not credential-sanitized).
  • Shutdown paths emit synchronously before process.exit.

Test plan

  • pnpm typecheck passes
  • pnpm --filter @aoagents/ao-core test (activity-events + lifecycle-manager tests pass; pre-existing OpenCode subprocess flakes are unrelated)
  • pnpm --filter @aoagents/ao-cli test (615/616 passing, 1 pre-existing flake in ao-doctor.sh shell script that passes in isolation)
  • All 16 new instrumentation tests pass
  • No regressions in existing CLI command tests (start/stop/spawn/update/setup/project/running-state)

🤖 Generated with Claude Code

…cycle (#1654)

Adds "cli" to ActivityEventSource and emits ~30 activity events across the CLI
surface so `ao events list --source cli` can answer RCA questions like:

- Did AO start cleanly? When? (cli.start_invoked / cli.start_failed)
- Was AO shut down gracefully or did it crash? (cli.shutdown_signal /
  cli.shutdown_completed / cli.shutdown_force_exit / cli.stale_running_pruned)
- Did ao spawn / ao update / ao stop / ao migrate-storage fail and why?
- Did the auto-restore prompt actually restore sessions?

Instrumented files:
- packages/core/src/activity-events.ts (cli source)
- packages/cli/src/lib/shutdown.ts (signal/completed/failed/force_exit/session_kill_failed)
- packages/cli/src/commands/start.ts (start_invoked/start_failed/restore_*/stop_*/daemon_*/last_stop_* /config_migrated)
- packages/cli/src/commands/spawn.ts (spawn_invoked/spawn_failed)
- packages/cli/src/commands/update.ts (update_invoked/update_failed)
- packages/cli/src/commands/setup.ts (setup_failed)
- packages/cli/src/commands/migrate-storage.ts (migration_completed/migration_failed)
- packages/cli/src/commands/project.ts (project_register_failed)
- packages/cli/src/lib/resolve-project.ts (project_resolve_failed/config_recovered/config_recovery_failed)
- packages/cli/src/lib/running-state.ts (lock_timeout/stale_running_pruned)
- packages/cli/src/lib/credential-resolver.ts (credential_load_failed)

All emits are sync, never wrapped in try/catch (recordActivityEvent never
throws), and put raw error text in data.errorMessage (not summary, which is
FTS-indexed and not credential-sanitized).

Tests:
- 16 new instrumentation tests across shutdown, migrate-storage, update,
  resolve-project, and start/stop action paths covering MUST emits.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR wires ~30 recordActivityEvent call sites across ao start, ao stop, ao spawn, ao update, ao setup, ao migrate-storage, and shared CLI helpers, and adds "cli" to the ActivityEventSource union in ao-core. The intent is to make ao events list --source cli useful for root-cause analysis of startup/shutdown failures.

  • Startup, graceful shutdown, session restore, project resolution, config recovery, lock timeouts, and stale-state pruning are all instrumented with correctly named, sync-safe events; error text is correctly routed to data.errorMessage, not summary.
  • Sixteen new tests verify MUST-emit sites, and all events follow the never-throw, never-await invariants.
  • Three issues: (1) cli.setup_failed fires on a partial-success path in ao setup openclaw, creating a false-positive failure event while the command prints "Setup complete!"; (2) cli.start_invoked is emitted after successful startup completes, not at invocation; (3) single-spawn preflight failures have no event coverage, unlike batch-spawn.

Confidence Score: 4/5

The change is purely additive instrumentation and does not alter session, config, or process control logic; safe to merge after addressing the setup.ts false-positive.

The cli.setup_failed false-positive in setup.ts is the sharpest issue: ao setup openclaw prints "Setup complete!" while simultaneously writing a failure event, so automated monitoring on that event kind would alert on successful runs. The cli.start_invoked naming and single-spawn preflight gap are observability imprecisions that do not break command behavior but make the event stream harder to interpret.

setup.ts (false-positive failure event), start.ts (event kind vs. placement mismatch), spawn.ts (single-spawn preflight gap), shutdown.ts (unregister skip if writeLastStop throws)

Important Files Changed

Filename Overview
packages/cli/src/commands/setup.ts Adds cli.setup_failed events. One emission fires on a partial-success path (openclaw.json write fallback) while the command continues and prints "Setup complete!", creating a false-positive failure event.
packages/cli/src/commands/start.ts Adds cli.start_invoked, cli.start_failed (with reason), cli.restore_, cli.last_stop_read_failed, cli.stop_ events. The cli.start_invoked name is misleading — it fires on successful completion, not at invocation.
packages/cli/src/commands/spawn.ts Adds cli.spawn_invoked and cli.spawn_failed. Single-spawn preflight failures have no coverage — inconsistent with batch-spawn which emits cli.spawn_failed on preflight errors.
packages/cli/src/lib/shutdown.ts Adds shutdown signal, force-exit, session-kill-failed, completed, and failed events. If writeLastStop throws, unregister() is skipped and running.json is left stale.
packages/core/src/activity-events.ts Adds "cli" to ActivityEventSource union. Minimal, correct change.
packages/cli/src/lib/running-state.ts Adds cli.lock_timeout and cli.stale_running_pruned. Both are sync-safe and follow the never-throws invariant.
packages/cli/src/lib/resolve-project.ts Adds cli.project_resolve_failed and cli.config_recovered / cli.config_recovery_failed events. All follow the invariants.
packages/cli/src/lib/credential-resolver.ts Adds cli.credential_load_failed in the catch block with error in data.errorMessage. Correct.
packages/cli/src/commands/update.ts Adds cli.update_invoked and cli.update_failed for git and npm paths. Covers all non-zero exit paths.
packages/cli/src/commands/migrate-storage.ts Adds cli.migration_completed and cli.migration_failed for rollback and forward migration. Correct.
packages/cli/src/commands/project.ts Adds cli.project_register_failed for three distinct failure reasons. Well-structured.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ao start invoked] --> B{daemon already running?}
    B -- yes, no arg --> C[user chooses quit/open/add/restart/new]
    C -- quit/open/add --> D[process.exit — NO event emitted]
    C -- restart --> E[cli.daemon_restart → kill daemon → continue]
    B -- no --> F[resolveOrCreateProject]
    F -- URL clone fails --> G[cli.project_resolve_failed]
    F -- no config → auto-create --> H[cli.config_recovered]
    F -- registerFlatConfig null --> I[cli.config_recovery_failed]
    F -- success --> J[runStartup]
    J -- orchestrator setup fails --> K[cli.start_failed reason:orchestrator_setup]
    J -- supervisor start fails --> L[cli.start_failed reason:supervisor_start]
    J -- restore sessions → per-session fail --> M[cli.restore_session_failed]
    J -- success → register running.json --> P[cli.start_invoked]
    A -- outer catch --> Q[cli.start_failed reason:outer]
    R[SIGINT/SIGTERM] --> S[cli.shutdown_signal]
    S --> T{sessions killed}
    T -- writeLastStop throws --> V[cli.shutdown_failed — unregister skipped]
    T -- success → unregister --> W[cli.shutdown_completed]
    S -- 10s timeout --> X[cli.shutdown_force_exit → process.exit]
Loading

Comments Outside Diff (2)

  1. packages/cli/src/commands/spawn.ts, line 363-373 (link)

    P2 Single-spawn preflight failures emit no activity event

    In registerSpawn, if runSpawnPreflight throws the outer catch block logs the error and calls process.exit(1) without emitting any activity event. cli.spawn_invoked is only emitted inside spawnSession (never reached on preflight failure), so preflight errors are invisible in the event stream. In contrast, registerBatchSpawn does emit cli.spawn_failed when preflight fails for a group. The inconsistency means ao events list --source cli is silent for single-spawn preflight failures.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/cli/src/commands/spawn.ts
    Line: 363-373
    
    Comment:
    **Single-spawn preflight failures emit no activity event**
    
    In `registerSpawn`, if `runSpawnPreflight` throws the outer `catch` block logs the error and calls `process.exit(1)` without emitting any activity event. `cli.spawn_invoked` is only emitted inside `spawnSession` (never reached on preflight failure), so preflight errors are invisible in the event stream. In contrast, `registerBatchSpawn` does emit `cli.spawn_failed` when preflight fails for a group. The inconsistency means `ao events list --source cli` is silent for single-spawn preflight failures.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. packages/cli/src/lib/shutdown.ts, line 127-135 (link)

    P2 unregister() is skipped if writeLastStop throws

    writeLastStop and unregister() share the same outer try block. If writeLastStop fails — for example because the last-stop lock times out while ao stop is concurrently holding it — the exception jumps to the catch, emits cli.shutdown_failed, and unregister() is never called. The running.json is left stale until the next ao start auto-prunes it. Wrapping writeLastStop in its own try/catch (or using try/finally around unregister()) would ensure cleanup always runs.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/cli/src/lib/shutdown.ts
    Line: 127-135
    
    Comment:
    **`unregister()` is skipped if `writeLastStop` throws**
    
    `writeLastStop` and `unregister()` share the same outer `try` block. If `writeLastStop` fails — for example because the last-stop lock times out while `ao stop` is concurrently holding it — the exception jumps to the `catch`, emits `cli.shutdown_failed`, and `unregister()` is never called. The `running.json` is left stale until the next `ao start` auto-prunes it. Wrapping `writeLastStop` in its own `try/catch` (or using `try/finally` around `unregister()`) would ensure cleanup always runs.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
packages/cli/src/commands/setup.ts:592-600
**`cli.setup_failed` fires on a successful-completion path**

When `writeOpenClawJsonConfig` returns `false`, the code emits `cli.setup_failed` — but then continues to write the shell export, print instructions, and print "Setup complete!". The user sees a success message while the event stream records a failure. Any `ao events list --source cli` query looking for `cli.setup_failed` will flag successful `ao setup openclaw` runs where only the OpenClaw-side JSON write was skipped (a recoverable, documented fallback). A distinct kind such as `cli.setup_partial` or adjusting to a `warn`-level sub-event would prevent the false positive.

### Issue 2 of 4
packages/cli/src/commands/start.ts:1635-1646
**`cli.start_invoked` semantics describe completion, not invocation**

The event is emitted *after* `runStartup()` succeeds and `register()` writes running.json — its own summary says `"ao start completed"`. As a result it only ever fires on the happy path and never appears on early exits or any failure path. A developer querying `ao events list --source cli` expecting a start-of-command marker paired with `cli.start_failed` will find no invocation marker before potential failures, only a completion record after success. Renaming to `cli.start_completed` would align the kind with the actual semantics.

### Issue 3 of 4
packages/cli/src/commands/spawn.ts:363-373
**Single-spawn preflight failures emit no activity event**

In `registerSpawn`, if `runSpawnPreflight` throws the outer `catch` block logs the error and calls `process.exit(1)` without emitting any activity event. `cli.spawn_invoked` is only emitted inside `spawnSession` (never reached on preflight failure), so preflight errors are invisible in the event stream. In contrast, `registerBatchSpawn` does emit `cli.spawn_failed` when preflight fails for a group. The inconsistency means `ao events list --source cli` is silent for single-spawn preflight failures.

### Issue 4 of 4
packages/cli/src/lib/shutdown.ts:127-135
**`unregister()` is skipped if `writeLastStop` throws**

`writeLastStop` and `unregister()` share the same outer `try` block. If `writeLastStop` fails — for example because the last-stop lock times out while `ao stop` is concurrently holding it — the exception jumps to the `catch`, emits `cli.shutdown_failed`, and `unregister()` is never called. The `running.json` is left stale until the next `ao start` auto-prunes it. Wrapping `writeLastStop` in its own `try/catch` (or using `try/finally` around `unregister()`) would ensure cleanup always runs.

Reviews (1): Last reviewed commit: "feat(cli): wire activity events into CLI..." | Re-trigger Greptile

Comment on lines +592 to 600
} else if (!openclawConfigWritten) {
recordActivityEvent({
source: "cli",
kind: "cli.setup_failed",
level: "warn",
summary: `failed to write ~/.openclaw/openclaw.json`,
data: { reason: "openclaw_json_write_failed" },
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 cli.setup_failed fires on a successful-completion path

When writeOpenClawJsonConfig returns false, the code emits cli.setup_failed — but then continues to write the shell export, print instructions, and print "Setup complete!". The user sees a success message while the event stream records a failure. Any ao events list --source cli query looking for cli.setup_failed will flag successful ao setup openclaw runs where only the OpenClaw-side JSON write was skipped (a recoverable, documented fallback). A distinct kind such as cli.setup_partial or adjusting to a warn-level sub-event would prevent the false positive.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli/src/commands/setup.ts
Line: 592-600

Comment:
**`cli.setup_failed` fires on a successful-completion path**

When `writeOpenClawJsonConfig` returns `false`, the code emits `cli.setup_failed` — but then continues to write the shell export, print instructions, and print "Setup complete!". The user sees a success message while the event stream records a failure. Any `ao events list --source cli` query looking for `cli.setup_failed` will flag successful `ao setup openclaw` runs where only the OpenClaw-side JSON write was skipped (a recoverable, documented fallback). A distinct kind such as `cli.setup_partial` or adjusting to a `warn`-level sub-event would prevent the false positive.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1635 to +1646
recordActivityEvent({
projectId,
source: "cli",
kind: "cli.start_invoked",
level: "info",
summary: `ao start completed: registered on port ${actualPort}`,
data: {
pid: process.pid,
port: actualPort,
projects: listLifecycleWorkers(),
},
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 cli.start_invoked semantics describe completion, not invocation

The event is emitted after runStartup() succeeds and register() writes running.json — its own summary says "ao start completed". As a result it only ever fires on the happy path and never appears on early exits or any failure path. A developer querying ao events list --source cli expecting a start-of-command marker paired with cli.start_failed will find no invocation marker before potential failures, only a completion record after success. Renaming to cli.start_completed would align the kind with the actual semantics.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli/src/commands/start.ts
Line: 1635-1646

Comment:
**`cli.start_invoked` semantics describe completion, not invocation**

The event is emitted *after* `runStartup()` succeeds and `register()` writes running.json — its own summary says `"ao start completed"`. As a result it only ever fires on the happy path and never appears on early exits or any failure path. A developer querying `ao events list --source cli` expecting a start-of-command marker paired with `cli.start_failed` will find no invocation marker before potential failures, only a completion record after success. Renaming to `cli.start_completed` would align the kind with the actual semantics.

How can I resolve this? If you propose a fix, please make it concise.

@illegalcall
Copy link
Copy Markdown
Collaborator Author

@harshitsinghbhandari — flagging you as the clear owner here based on git blame of the CLI files this PR touches:

Note: setup.ts and credential-resolver.ts are dominated by @illegalcall (Dhruv) — those are stable, not the focus of this PR. Could you take this one over for review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Activity events: CLI commands and supervisor lifecycle (sub-issue of #1511)

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.