Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DEVOP-560: add org-wide daily Shai-Hulud IOC sweep workflow#8

Open
srt0422 wants to merge 10 commits into
mainallora-network/.github:mainfrom
scott/devop-560-shai-hulud-sweepallora-network/.github:scott/devop-560-shai-hulud-sweepCopy head branch name to clipboard
Open

DEVOP-560: add org-wide daily Shai-Hulud IOC sweep workflow#8
srt0422 wants to merge 10 commits into
mainallora-network/.github:mainfrom
scott/devop-560-shai-hulud-sweepallora-network/.github:scott/devop-560-shai-hulud-sweepCopy head branch name to clipboard

Conversation

@srt0422
Copy link
Copy Markdown

@srt0422 srt0422 commented May 25, 2026

Summary

Adds the first workflow under .github/workflows/ in this repo — a scheduled
daily sweep for Shai-Hulud indicators of compromise across every repo in the
allora-network org, plus a rolling GitHub Issue and Slack page on findings.

Closes DEVOP-560.

What ships

  • .github/workflows/shai-hulud-sweep.ymlschedule: '7 4 * * *'
    (04:07 UTC daily, off-peak + off-minute) plus workflow_dispatch for manual
    runs. Permissions limited to contents: read + issues: write. Pinned SHAs
    for actions/checkout@v4.2.2 and actions/upload-artifact@v4.4.3, matching
    the convention in allora-network/ci-workflows-private.
  • scripts/shai-hulud-ioc-sweep.sh — canonical detection logic, vendored
    verbatim from allora-network/skills@71aeefb (skills/shai-hulud-defense/scripts/).
    See file header for refresh procedure / pinned commit.
  • docs/plans/2026-05-25-devop-560-shai-hulud-sweep.md — design notes for
    why we vendor the script, how the rolling issue is maintained, when Slack
    fires, and the GH_ORG_READ_TOKEN follow-up.

Detection coverage (per the vendored script)

  • Lockfile entries (npm/pip/Go) matching .github/security/ioc-packages.txt.
  • Any .js/.cjs/.mjs file ≤ 2 MB whose SHA-256 matches
    .github/security/ioc-hashes.txt (filename-agnostic — bundle.js rename
    doesn't bypass).
  • Persistence: */.github/workflows/shai-hulud*.{yml,yaml} at repo root.
  • npm install/postinstall/preinstall lifecycle scripts matching
    node …bundle.js, curl|sh, wget|sh, base64 -d|--decode|-D,
    eval $(…), or npx … bundle.
  • Go replace directives: untrusted-host RHS, absolute-path RHS,
    top-level-path mismatch (Scenario C in-org redirect), and local replacements
    (./ / ../) flagged for human review.
  • Go workflow env settings (GOSUMDB=off, GONOSUMCHECK, GOINSECURE,
    GOFLAGS=*-insecure) — direct and indirect (vars/secrets/env/inputs).
  • Public exfil repos matching ^[Ss]hai-[Hh]ulud under org:allora-network
    AND under each org member (rate-limited).

Outputs

Sweep result Script exit Rolling issue Slack
Clean (no findings) 0 no-op no-op
Operational (clone_failed / check_skipped / go_local_replace) 2 comment appended (or issue opened with label shai-hulud-sweep) no-op
IOC-grade 1 comment appended (or issue opened) paged via ${{ secrets.SLACK_SECURITY_WEBHOOK }}

Forensic evidence (clones of repos that produced IOC findings) is uploaded as
a workflow artifact for 30 days so humans can inspect the matched file without
re-cloning point-in-time evidence.

The workflow never auto-closes the rolling issue; humans drive close/reopen so
triage state survives across daily runs.

Secrets used

  • SLACK_SECURITY_WEBHOOK — org secret; payload only delivered on IOC-grade
    findings. No-ops gracefully if unset (warning, not failure).
  • GH_ORG_READ_TOKENoptional org secret. When present, preferred over
    the default GITHUB_TOKEN for org-wide enumeration so private repos and the
    org-members exfil search are covered. When absent, member enumeration emits
    check_skipped operational findings — visible partial coverage, never a
    silent false-clean.

Verification

  • actionlint .github/workflows/shai-hulud-sweep.yml — clean (no findings).
  • python3 -c 'yaml.safe_load(...)' — parses.
  • Manual workflow_dispatch recommended after merge to verify
    gh issue and Slack paths end-to-end against the live org.

Followups (intentionally out-of-scope for this PR)

  • Provision GH_ORG_READ_TOKEN (fine-grained PAT or GitHub App token with
    read:org + repo:read) once org-admin signs off — the workflow already
    prefers it when present.
  • Quarterly review of the trusted Go module-path allowlist (GO_TRUSTED_HOSTS_RE
    in the script) to keep go_suspicious_replace false-positive rate low.

Made with Cursor


Summary by cubic

Adds a daily org‑wide Shai‑Hulud IOC sweep that scans all allora-network repos, updates a rolling issue, and pages Slack on incident‑grade findings with alert‑dedup and a weekly re‑page. Meets DEVOP-560 requirements: scheduled workflow at .github/workflows/shai-hulud-sweep.yml, org repo iteration, IOC list checks, member exfil search, rolling issue updates, and Slack notifications with minimal perms.

  • New Features

    • Added .github/workflows/shai-hulud-sweep.yml (cron 7 4 * * * + workflow_dispatch; minimal perms; serialized concurrency). Pins actions/checkout@v4.2.2 and actions/upload-artifact@v4.4.3.
    • Vendored scripts/shai-hulud-ioc-sweep.sh with SHA‑256 sidecar verification; reads .github/security/ioc-packages.txt and .github/security/ioc-hashes.txt (# schema:v1). Detection covers lockfiles (incl. structured package-lock.json), sub‑2MB JS SHA‑256 hashes, exact Shai‑Hulud persistence workflow filenames, suspicious npm lifecycle scripts, Go replace/path‑mismatch/unsafe‑env, and org/member public exfil search.
    • Outputs: maintains a rolling issue labeled shai-hulud-sweep; Slack via SLACK_SECURITY_WEBHOOK on IOC with dedup gating; prefers GH_ORG_READ_TOKEN, else falls back to GITHUB_TOKEN and emits check_skipped; uploads only findings.json, summary.md, repos.txt for 30 days.
    • Added .github/CODEOWNERS to require @allora-network/security (and @allora-network/devops for the workflow). Added plan doc with action SHA‑pin rotation and follow‑ups.
  • Bug Fixes

    • Slack: IOC alert dedup by IOC hash‑stamp (first‑seen/changed/≥7‑day re‑page); filter dedup markers to github-actions[bot]; write the paged-at marker only after a successful Slack send; fail‑open if the stamp can’t be computed; 3‑attempt retry with backoff and Retry‑After honoring; fixed HTTP code capture.
    • Rolling issue: new shared “Find rolling issue” step (oldest‑open via --search sort:created-asc) reused by dedup, updates, and paged‑marker steps; IOC comments include visible page‑decision plus hidden stamp markers; Slack still fires on IOC even if the issue update fails.
    • Safety: sanitize untrusted strings in Slack and issue bodies; verify the vendored script via a locked‑path SHA‑256 sidecar before execution.
    • Artifacts/robustness: restrict uploads to structured outputs; suffix artifact name with ${{ github.run_attempt }}; placeholder summary on pre‑aggregation failure; final run summary surfaces the Slack‑dedup tri‑state explicitly.
    • Detection tuning: narrow persistence detection to exact IOC filenames (avoids self‑alerts); add # schema:v1 headers and assertions; extend GO_TRUSTED_HOSTS_RE to include cometbft; refresh the .sha256 sidecar.

Written for commit c10d0dc. Summary will update on new commits. Review in cubic

Adds the first workflow under .github/workflows/ for the allora-network/.github
repo: a scheduled daily sweep (04:07 UTC) plus workflow_dispatch that scans
every repo in the org for Shai-Hulud indicators of compromise, maintains a
rolling GitHub issue labelled `shai-hulud-sweep`, and pages Slack via the
SLACK_SECURITY_WEBHOOK secret on incident-grade findings.

Detection logic lives in scripts/shai-hulud-ioc-sweep.sh, vendored verbatim
from allora-network/skills@71aeefb (skills/shai-hulud-defense). The script is
vendored rather than cloned at workflow time because that repo is private and
the workflow's default GITHUB_TOKEN cannot read it; vendoring also keeps the
daily sweep working through upstream rename/outage. See the script header for
the refresh procedure.

Key design choices documented in docs/plans/2026-05-25-devop-560-shai-hulud-sweep.md:
- IOC inputs read from .github/security/ioc-packages.txt + ioc-hashes.txt
  (DEVOP-561, merged in PR #2). Script validates the `# schema:v1` header
  before running so a silent seed-list format change fails closed.
- Rolling issue: workflow finds an existing open issue with the label and
  appends a comment, else opens a new one. Humans drive close/reopen so
  triage state survives across daily runs.
- Slack page only fires on IOC-grade findings (script exit 1). Operational
  findings (exit 2 — clone_failed / check_skipped / go_local_replace) update
  the issue but do not page after-hours.
- Permissions: `contents: read` + `issues: write` only.
- Pinned SHAs for actions/checkout (v4.2.2) and actions/upload-artifact
  (v4.4.3) match the convention in allora-network/ci-workflows-private.
- Prefers a `GH_ORG_READ_TOKEN` secret if present (private-repo + member
  enumeration); falls back to GITHUB_TOKEN. In the fallback path, member
  enumeration emits `check_skipped` operational findings so the partial
  coverage is visible in the rolling issue.

Linear: https://linear.app/alloralabs/issue/DEVOP-560
Co-authored-by: Cursor <cursoragent@cursor.com>
@srt0422 srt0422 added shai-hulud Shai-Hulud supply-chain defense work needs-human-review labels May 25, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

1 issue found

Linked issue analysis

Linked issue: DEVOP-560: Create org-wide daily IOC sweep workflow in .github repo

Status Acceptance criteria Notes
.github/workflows/shai-hulud-sweep.yml exists in questa repo PR adds the workflow file at .github/workflows/shai-hulud-sweep.yml.
Cron: daily at an off-peak local time, off-minute (e.g. '7 4 * * *') Workflow schedule is set to '7 4 * * *' and includes workflow_dispatch for manual runs.
Iterate all org repos via gh api orgs/allora-network/repos --paginate The vendored script enumerates repos using gh api --paginate /orgs/$ORG/repos and the workflow runs that script.
For each repo: shallow clone, run sweep checks, report findings Script uses git clone --depth 1 to shallow-clone each repo, runs the collection/checks, and aggregates findings; workflow uploads artifacts and updates the rolling issue.
Compare against IOC lists at .github/security/ioc-packages.txt and .github/security/ioc-hashes.txt Workflow passes those two paths to the script; the script validates the schema header and uses both lists in its matching logic.
⚠️ Search the GitHub API for public repos under org members named ^[Ss]hai-[Hh]ulud Script implements org-scoped and per-member searches and records findings, but per-member enumeration can be limited without a provisioned GH_ORG_READ_TOKEN and will emit check_skipped; member-side search is rate-limited with sleeps. Implementation exists but effective coverage depends on token provisioning and rate limits.
Maintain a single rolling GitHub Issue in .github repo; append new findings Workflow finds an open issue with label shai-hulud-sweep (oldest first), appends a comment if present, or creates a labeled rolling issue otherwise.
⚠️ If any new finding fires, post to a Slack incoming webhook (SLACK_SECURITY_WEBHOOK org secret) Workflow posts to the Slack webhook, but only for IOC-grade runs (script exit code 1). Operational findings (exit 2) update the rolling issue but do not page Slack. The step also no-ops if SLACK_SECURITY_WEBHOOK is unset.
Sweep checks — Lockfile entries matching ioc-packages.txt (name@version) Script builds per-ecosystem needle lists and performs structured and substring matching against npm/pip/Go lockfiles.
Sweep checks — bundle.js files anywhere with SHA-256 matching ioc-hashes.txt Script collects .js/.cjs/.mjs files ≤ 2 MB and compares SHA-256 against the provided hashes list, emitting ioc_bundle_hash findings.
Sweep checks — .github/workflows/shai-hulud*.yaml / shai-hulud-workflow.yml persistence detection Script scopes workflow-file scanning to the repo-root .github/workflows directory and flags shai-hulud*.{yml,yaml} as persistence_workflow findings.
Sweep checks — Postinstall patterns: node bundle.js, curl|sh, wget|sh, base64-decode chains in package.json scripts Script inspects package.json scripts for install/postinstall/preinstall and matches a broad regex covering node …bundle.js, curl|sh, wget|sh, base64 -d/--decode/-D, eval $(…), npx … bundle patterns.
Sweep checks — The webhook.site exfil URL substring The acceptance criteria list webhook.site substring matching, but I cannot find a specific check for the webhook.site URL substring in the vendored script or workflow; other exfil detection (public-exfil repo name, suspicious curl targets) exists, but no explicit webhook.site pattern match is present.
Workflow uses minimal permissions: contents: read, issues: write Workflow permissions block sets contents: read and issues: write and no broader permissions are granted.
PR merged (closure of DEVOP-560) Acceptance requires merging the PR; the PR is open (this is the review), so the 'merged' criterion is not yet satisfied by the current state.
Architecture diagram
sequenceDiagram
    participant Sched as Cron Schedule (04:07 UTC)
    participant Action as GitHub Actions Workflow
    participant Script as shai-hulud-ioc-sweep.sh
    participant GHAPI as GitHub API (REST/GraphQL)
    participant Repos as Org Repos (clone targets)
    participant Issue as Rolling Issue (.github repo)
    participant Slack as Slack Security Webhook
    participant Artifact as Uploaded Artifact

    Note over Sched,Artifact: NEW: Daily org-wide IOC sweep

    alt Scheduled trigger (cron '7 4 * * *')
        Sched->>Action: Trigger workflow
    else Manual trigger (workflow_dispatch)
        Action->>Action: Manual run started
    end

    Action->>Action: concurrency serialization (no cancel-in-progress)
    Action->>Action: Set GH_TOKEN (GH_ORG_READ_TOKEN || GITHUB_TOKEN)
    Action->>Action: Checkout .github repo (IOC lists + script)
    Action->>Script: Run with ORG, IOC package/hash files

    Note over Script: Detection logic (vendored from alla-network/skills)

    Script->>GHAPI: List all repos in org (public + private if token allows)
    GHAPI-->>Script: Repo list

    Script->>GHAPI: List org members (for exfil repo search)
    alt GH_TOKEN has read:org
        GHAPI-->>Script: Member list
    else GH_TOKEN lacks read:org
        Script->>Script: Emit check_skipped operational finding
    end

    loop Per repo in org
        Script->>Repos: git clone (via gh auth credential helper)
        alt Clone succeeds
            Script->>Script: Scan for IOC patterns
            alt IOC package match found
                Script->>Script: finding() - ioc_package_match
            end
            alt IOC hash match found (.js/.cjs/.mjs <= 2MB)
                Script->>Script: finding() - ioc_bundle_hash
            end
            alt Suspicious workflow files found
                Script->>Script: finding() - persistence_workflow
            end
            alt Suspicious npm lifecycle scripts found
                Script->>Script: finding() - suspicious_lifecycle_script
            end
            alt Go replace directives anomalies found
                Script->>Script: finding() - go_suspicious_replace
            end
            alt Go unsafe CI env vars found
                Script->>Script: finding() - go_unsafe_env
            end
            opt Finding detected (non-operational)
                Script->>Script: Preserve clone as forensic evidence
                Script->>Script: Append to dirty-repos list
            end
        else Clone fails
            Script->>Script: finding() - clone_failed (operational)
        end
    end

    loop Check public exfil repos (org scope)
        Script->>GHAPI: Search repos matching ^[Ss]hai-[Hh]ulud under org
        GHAPI-->>Script: Exfil repo list
        alt Exfil repos found
            Script->>Script: finding() - public_exfil_repo
        end
    end

    loop Check public exfil repos (member scope)
        alt GH_TOKEN supports member search
            Script->>GHAPI: Search repos per member
            GHAPI-->>Script: Exfil repo list per member
            alt Exfil repos found
                Script->>Script: finding() - public_exfil_repo_member
            end
        else Token lacks permission
            Script->>Script: finding() - check_skipped (operational)
        end
    end

    Script->>Script: Aggregate findings to findings.ndjson + summary.md
    alt Exit code 0 (clean)
        Script-->>Action: rc=0
    else Exit code 1 (IOC findings)
        Script-->>Action: rc=1
    else Exit code 2 (operational only)
        Script-->>Action: rc=2
    end

    Action->>Artifact: Upload sweep output + forensic evidence clones (30-day retention)

    alt rc == 1 or rc == 2 (findings exist)
        Action->>Issue: Find open issue with label "shai-hulud-sweep"
        alt Existing issue found
            Action->>Issue: Append comment with run summary
        else No existing issue
            Action->>Issue: Create new issue with label + summary
        end
    end

    alt rc == 1 (IOC-grade findings only)
        alt SLACK_SECURITY_WEBHOOK is set
            Action->>Slack: POST run summary (capped at ~2.8 KB)
            Slack-->>Action: 200 OK
        else Webhook not set
            Action->>Action: Log warning, skip (no failure)
        end
    end
Loading

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread .github/workflows/shai-hulud-sweep.yml Outdated
srt0422 and others added 4 commits May 25, 2026 00:50
Four mechanical fixes flagged as safe_auto by the ce-code-review headless
pass on PR #8:

- Add `--connect-timeout 5 --max-time 15` to the Slack webhook curl so a
  hung incoming-webhook endpoint cannot stall the workflow up to the
  60-minute job timeout.
- Gate the Page-Slack step with `always() && rc == '1'` so an IOC-grade
  finding still pages Slack when the preceding Update-rolling-issue step
  failed (the rolling issue is a redundant channel — Slack is the primary
  page).
- Suffix the upload-artifact name with `${{ github.run_attempt }}` so a
  re-run does not 409 on actions/upload-artifact@v4's unique-name rule.
- Hoist the `output_dir` GITHUB_OUTPUT write to immediately after
  `mkdir -p "$OUTPUT_DIR"` so the upload-artifact step's `always() &&
  output_dir != ''` gate survives a mid-sweep crash (script abort would
  otherwise leave output_dir unset and skip evidence upload).

Re-validated with `actionlint`. No behavior change beyond the four
hardening fixes; the rc-based control flow and rolling-issue / Slack
semantics are unchanged.

Linear: https://linear.app/alloralabs/issue/DEVOP-560
Co-authored-by: Cursor <cursoragent@cursor.com>
… and silent failures (DEVOP-560)

Apply six ce-code-review findings to the daily Shai-Hulud IOC sweep:

- Finding B (P1): rolling-issue lookup now uses --search with
  sort:created-asc, so a long-running incident issue stays canonical
  even when a duplicate is filed later (`gh issue list` defaults to
  newest-first and exposes no --sort flag).
- Finding C (P1): wrap the Slack webhook POST in a 3-attempt retry
  loop with 5s/15s/45s backoff. Retries on 408/429/5xx + curl-level
  failures; honors Retry-After on 429; bails on terminal 4xx.
- Finding E (P1): strip backtick/<>|*_ from the attacker-controllable
  IOC `detail` field before wrapping it in a Slack code fence or
  inlining it in the GitHub issue body. The uploaded findings.json
  remains the canonical un-sanitized source for forensic review.
- Finding D (P1): restrict the uploaded artifact path to the
  structured outputs (findings.json, summary.md, repos.txt) instead
  of the entire output_dir. Anyone with actions:read on this repo
  can download artifacts, and the previous wildcard included raw
  clones / preserved evidence trees of private org repos.
- Finding H (P2): when the script exits rc != 0 without producing
  summary.md (precondition failure / pre-aggregation crash), emit a
  minimal placeholder so the rc != 0 -> rolling-issue contract holds
  and the failure surfaces in triage instead of being silently
  dropped.
- Finding G (P2): commit scripts/shai-hulud-ioc-sweep.sh.sha256 and
  verify it as the first action of the sweep step. The vendored
  script's canonical source lives upstream in allora-network/skills;
  the sidecar is the in-repo integrity gate. A PR that modifies the
  script body without refreshing the sidecar fails this step loudly
  instead of executing a tampered detector.

Workflow validated with `actionlint` and `python3 -c "import yaml;
yaml.safe_load(...)"` after each edit. 346 lines (under the 350 cap).

Linear: https://linear.app/alloralabs/issue/DEVOP-560
Co-authored-by: Cursor <cursoragent@cursor.com>
…EVOP-560)

A single PR that modifies the daily Shai-Hulud sweep workflow, the
vendored detector script, the SHA-256 integrity sidecar, or the IOC seed
lists can silently disable detection if no human review is enforced.
This adds an in-repo CODEOWNERS rule requiring `@allora-network/security`
approval on those paths (with `@allora-network/devops` co-owning the
workflow file for routine operational tweaks). CODEOWNERS itself is
self-owned so a single PR cannot rewrite the rules + disable detection
in lockstep.

Team slugs were verified via `gh api orgs/allora-network/teams/security`
and `.../devops` on 2026-05-25.

The complementary "Require review from Code Owners" branch-protection
rule is an org-admin task and is documented as a follow-up in the plan
doc; this commit only handles the in-repo half.

Linear: https://linear.app/alloralabs/issue/DEVOP-560
Co-authored-by: Cursor <cursoragent@cursor.com>
Two ce-code-review findings against the plan doc:

- Finding I (P2): document the rotation procedure for the third-party
  action SHA pins (actions/checkout, actions/upload-artifact). Names
  owner (@allora-network/devops, with security co-review via the
  workflow's CODEOWNERS rule), cadence (quarterly + on CVE), canonical
  source for the latest release SHA per action, and a 4-step rotation
  procedure. Notes `.github/dependabot.yml` as the automation follow-up.
- Finding F (P1, deferred): document missed-daily-run / cron-disabled
  observability as out-of-scope for this PR. The fix is materially
  additive (separate watchdog workflow or external healthcheck) and
  doesn't belong inline with the initial sweep ship.

Also adds the branch-protection "Require review from Code Owners"
follow-up surfaced by Finding A — the in-repo CODEOWNERS rule was
landed in the prior commit but the actual blocking gate is org-admin
territory.

Linear: https://linear.app/alloralabs/issue/DEVOP-560
Co-authored-by: Cursor <cursoragent@cursor.com>
srt0422 and others added 5 commits May 26, 2026 10:07
Co-authored-by: Cursor <cursoragent@cursor.com>
- P0 (script): Narrow persistence_workflow glob to exact known IOC
  filenames (shai-hulud.yml / shai-hulud.yaml / shai-hulud-workflow.yml /
  shai-hulud-workflow.yaml) so the legitimate defense workflow
  .github/workflows/shai-hulud-sweep.yml no longer self-detects as
  an IOC on every daily sweep — guaranteed false page → alert fatigue.

- P1 (seed files): Add '# schema:v1' header to ioc-packages.txt and
  ioc-hashes.txt. Without the packages header the new schema-version
  assertion in the detector exits 2 at startup every run, leaving the
  sweep structurally inert.

- P2 (script): Add parallel '# schema:v1' assertion against
  HASHES_FILE — mirrors the packages-file gate so a future reformat of
  the hashes seed list fails loud instead of silently zero-matching.

- P2 (script): Add cometbft to default GO_TRUSTED_HOSTS_RE so Cosmos/
  CometBFT same-path version pins (replace github.com/cometbft/cometbft
  => github.com/cometbft/cometbft <version>) no longer trip
  go_suspicious_replace once the sweep is unblocked from the schema:v1
  gate above.

- Regenerate scripts/shai-hulud-ioc-sweep.sh.sha256 in lockstep with
  the detector edits so the workflow's integrity-gate passes.

Co-authored-by: Cursor <cursoragent@cursor.com>
…-560)

Without this gate the bare `if: steps.sweep.outputs.rc == '1'` Slack step
pages on every IOC-grade run, so a standing unresolved IOC pages the
channel daily and conditions responders to mute it — classic alert
fatigue. Raised by cubic at PRRT_kwDOLZ5Xss6Ee5gN and independently by
four ce-code-review reviewers (P1, anchor 100).

Implementation:

- New `ioc-dedup` step (rc=1 only) computes a stable IOC stamp as the
  sha256 of the sorted `{repo,rule,path,detail}` TSV of IOC-grade rows
  in findings.json (`ts` deliberately excluded so an identical IOC set
  produces an identical stamp across daily runs).
- Looks up the rolling issue's full comment history (cross-page sort)
  for the most recent `<!-- shai-hulud-ioc-stamp: ... -->` and
  `<!-- shai-hulud-paged-at: ... -->` markers.
- Decides `should_page`:
    * first IOC-grade run after clean (no prior stamp)        → page
    * IOC set differs from previous stamp                     → page
    * same IOC set but >= 7d since last Slack page            → page
    * otherwise                                               → skip
  Fail-open when findings.json is missing/empty on rc=1: page so an
  unknown-state run surfaces visibly rather than dedup-silencing.
- Rolling-issue update step now embeds the stamp marker on every rc=1
  comment and the paged-at marker only when Slack actually fires, so a
  deduped comment carries forward the older real paged-at timestamp and
  the weekly re-page window stays honest.
- Slack step gated on `should_page == 'true'`. A new `Slack page
  suppressed by IOC dedup` step emits a workflow notice for visibility,
  and the final-run-summary step surfaces the dedup decision too.
- Visible `- **Slack page:** yes|suppressed (reason: ...)` footer in the
  rolling-issue comment body makes the decision obvious to humans
  scanning the issue, alongside the hidden HTML markers used by the
  next run's dedup lookup.

Plan doc: the Slack-alert-path decision now spells out the dedup +
weekly-repage policy and warns explicitly against regressing to a bare
`rc == '1'` gate, so the next reviewer doesn't reintroduce the
alert-fatigue regression. IOC_RULES_RE drift between workflow and
script is called out as a coupling that must stay in sync.

Refs: DEVOP-560, PRRT_kwDOLZ5Xss6Ee5gN (cubic), ce-code-review anchor 100
Co-authored-by: Cursor <cursoragent@cursor.com>
- (P2) Append `|| true` to the ioc-dedup current_stamp `jq | sha256sum
  | awk` pipeline so a malformed findings.json (or mid-run mutation)
  routes through the documented fail-open guard at lines 230-244 instead
  of aborting the step under `set -euo pipefail` and silently
  fail-CLOSING the Slack page. Mirrors the `|| true` already present
  on the four sibling pipelines in the same step.

- (P2) Fix Slack curl http_code capture: replace
  `... || echo 000)` with `... || true)` followed by
  `: "${http_code:=000}"`. The prior form appended an extra '000'
  to curl's own '%{http_code}' output, producing the literal '000000'
  which fell through the `000|408|429|5*` transient-classification
  case to terminal=0 and disabled the curl-level retry path the loop
  exists for.

- (P3) Replace the two-branch `if [ "${SHOULD_PAGE:-true}" = "true" ]`
  in the Final run summary with an explicit three-way `case`
  (true / false / *) so the unknown-state branch emits an
  `::error::` rather than defaulting to a false "Slack paged" claim
  when the ioc-dedup step crashed before writing $GITHUB_OUTPUT.
  Resolves the three-way contradiction between the Slack gate
  (strict ==true), the suppression-notice gate (!=true), and this
  summary.

Co-authored-by: Cursor <cursoragent@cursor.com>
- (P2 #5) Extract a new `Find rolling issue` step (gated on
  `rc=='1' || rc=='2'`) that resolves the rolling-issue number ONCE
  per run via the canonical `gh issue list ... sort:created-asc`
  query and exposes it as `steps.find-rolling-issue.outputs.issue_num`.
  Replace the duplicated inline `gh issue list` calls in the ioc-dedup
  and rolling-issue-update steps with the shared output. Removes the
  drift-hazard `# same query as the update step below — keep in sync`
  coupling and closes the TOCTOU window where a human could close the
  rolling issue between the two independent lookups.

- (P1 #1) Filter the ioc-dedup comment scan to `github-actions[bot]`
  authorship. Previously the `gh api ... --jq '.[] | {body, created_at}'`
  projection accepted markers from ANY commenter, so anyone with
  `issues: write` (or anyone able to social-engineer a maintainer into
  pasting attacker-supplied marker text) could forge
  `<!-- shai-hulud-ioc-stamp: <sha256> -->` or
  `<!-- shai-hulud-paged-at: <iso8601> -->` into the rolling issue and
  silently suppress real Slack pages by poisoning the dedup chain.
  Only this workflow (running as GITHUB_TOKEN) emits canonical markers,
  and its comments are attributed to `github-actions[bot]` — restrict
  the source set accordingly. Defense-in-depth follow-up (binding
  markers to the emitting run_id and verifying via gh api) deferred.

- (P1 #2) Move paged-at marker emission to a dedicated post-Slack step
  (`Persist Slack-paged marker`) gated on
  `success() && rc=='1' && should_page=='true'` so a failed Slack
  delivery never writes a paged-at timestamp. The rolling-issue update
  step keeps writing the IOC stamp marker (which represents the dedup
  decision input, NOT the Slack-delivery outcome — that's correct
  gating). The dedup reader already scans the most-recent paged-at
  marker across ALL bot-authored comments, so splitting the markers
  across two comments composes correctly with no parser change.
  Previously the paged-at marker was committed BEFORE the Slack page
  ran, so a failed Slack send would still record a paged-at timestamp
  and silently corrupt the dedup chain for up to 7 days (next
  IOC-grade run would believe Slack had paged, suppress its own page,
  and the standing IOC would stop alerting until the weekly re-page
  window expired).

  The new step has a `gh issue list` fallback for the rare case where
  the update step created a fresh rolling issue this run (so
  find-rolling-issue's output was empty); fail-OPEN warning if no
  issue is resolvable at all so a missing paged-at marker just forces
  the next run to page conservatively.

Verification: actionlint clean; YAML parses (11 steps in canonical
order: checkout → verify-tools → sweep → upload → find-rolling-issue
→ ioc-dedup → update-rolling-issue → slack-page → persist-paged-at →
slack-suppressed-notice → final-summary).

Refs: DEVOP-560, ce-code-review run 20260526-101810-4793bf13
findings #1 (anchor 100, security+adversarial), #2 (anchor 100,
correctness+adversarial+reliability), #5 (anchor 75, maintainability).
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review shai-hulud Shai-Hulud supply-chain defense work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.