fix(ci): migrate benchmarks to benchmarking-platform trigger#607
Conversation
This comment has been minimized.
This comment has been minimized.
…ost-benchmark jobs
CI Test ResultsRun: #28403263029 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-29 21:33:48 UTC |
Benchmark ResultsPipeline: https://gitlab.ddbuild.io/DataDog/apm-reliability/benchmarking-platform/-/pipelines/121736231 Commit:
|
| Benchmark | JDK | Latest | Dev | Δ (dev vs latest) | Issues L/D |
|---|---|---|---|---|---|
| akka-uct | 21 | ✅ 10445 ms (7 iters) | ✅ 10343 ms (7 iters) | ≈ -1% (±23%) | — / — |
| akka-uct | 25 | ✅ 8835 ms (8 iters) | ✅ 8997 ms (8 iters) | ≈ +1.8% (±18.3%) | — / — |
| finagle-chirper | 21 | ✅ 5956 ms (11 iters) | ✅ 6047 ms (11 iters) | ≈ +1.5% (±46.8%) | |
| finagle-chirper | 25 | ✅ 5460 ms (12 iters) | ✅ 5444 ms (12 iters) | ≈ -0.3% (±42%) | |
| fj-kmeans | 21 | ✅ 2867 ms (22 iters) | ✅ 2849 ms (22 iters) | ≈ -0.6% (±4.2%) | — / — |
| fj-kmeans | 25 | ✅ 2863 ms (22 iters) | ✅ 2771 ms (23 iters) | ≈ -3.2% (±4.4%) | — / — |
| future-genetic | 21 | ✅ 2054 ms (30 iters) | ✅ 2194 ms (28 iters) | 🔴 +6.8% | — / — |
| future-genetic | 25 | ✅ 2069 ms (30 iters) | ✅ 2121 ms (29 iters) | ≈ +2.5% (±4.5%) | — / — |
| naive-bayes | 21 | ✅ 1265 ms (45 iters) | ✅ 1219 ms (47 iters) | ≈ -3.6% (±55.9%) | — / — |
| naive-bayes | 25 | ✅ 1014 ms (56 iters) | ✅ 1001 ms (57 iters) | ≈ -1.3% (±55.5%) | — / — |
| reactors | 21 | ✅ 17727 ms (5 iters) | ✅ 16471 ms (5 iters) | ≈ -7.1% (±15.8%) | — / — |
| reactors | 25 | ✅ 18140 ms (5 iters) | ✅ 18396 ms (5 iters) | ≈ +1.4% (±4.2%) | — / — |
Internal counter details (ddprof)
ddprof internal counters, latest / dev (✅ = 0, · = unavailable):
| Benchmark | JDK | Dropped rec | Dropped jvmti | Dropped trace | Skipped WC | AGCT fail | Unwind fail |
|---|---|---|---|---|---|---|---|
| akka-uct | 21 | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ |
| akka-uct | 25 | ✅ / ✅ | ✅ / ✅ | 2 / 1 | 2094 / 2091 | ✅ / ✅ | ✅ / ✅ |
| finagle-chirper | 21 | ✅ / ✅ | ✅ / ✅ | 7 / ✅ | 8792 / ✅ | ✅ / ✅ | ✅ / ✅ |
| finagle-chirper | 25 | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | 8693 / 8009 | ✅ / ✅ | ✅ / ✅ |
| fj-kmeans | 21 | ✅ / ✅ | ✅ / ✅ | ✅ / 1 | ✅ / 1267 | ✅ / ✅ | ✅ / ✅ |
| fj-kmeans | 25 | ✅ / ✅ | ✅ / ✅ | 3 / 4 | 1293 / 1285 | ✅ / ✅ | ✅ / ✅ |
| future-genetic | 21 | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / 3072 | ✅ / ✅ | ✅ / ✅ |
| future-genetic | 25 | ✅ / ✅ | ✅ / ✅ | 1 / 1 | 2931 / 2881 | ✅ / ✅ | ✅ / ✅ |
| naive-bayes | 21 | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ |
| naive-bayes | 25 | ✅ / ✅ | ✅ / ✅ | 3 / 3 | 3489 / 3502 | ✅ / ✅ | ✅ / ✅ |
| reactors | 21 | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ | ✅ / ✅ |
| reactors | 25 | ✅ / ✅ | ✅ / ✅ | 1 / 1 | 1819 / 1862 | ✅ / ✅ | ✅ / ✅ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc5c810c9d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- benchmarks-trigger: CANCELLED rule, interruptible:false, UPSTREAM_BRANCH fix, remove PREPARE_IMAGE - publish-benchmark-gh-pages: mirror skip rules, use $DD_OCTO_STS_IMAGE - download-s3-reports.sh: require downstream pipeline status==success - report-dd-trace-results: add resource_group:gh-pages-publish - async-profiler-build.ci.sts.yaml: restrict to java-profiler; restore benchmarking-platform-reports.ci.sts.yaml - publish-gh-pages.sh: fix wrong URL in header comment
rkennke
left a comment
There was a problem hiding this comment.
Reviewed the benchmark migration — direction is clearly right and the GitLab bridge mechanics (strategy: depend, interruptible: false, moving the CANCELLED gate into rules: since bridges can't have before_script, and the project-wide resource_group: gh-pages-publish to serialize force-pushes) are all handled correctly and well-commented. Inline comments below are minor/cleanup. Two findings can't be anchored inline because they touch files outside this diff:
1. (verify before merge) mem_watch.sh deletion vs. reliability check. This PR deletes .gitlab/benchmarks/steps/mem_watch.sh, but .gitlab/reliability/memory_trend_check.sh:58 still calls sh ./benchmarks/steps/mem_watch.sh .... The path differs by a .gitlab/ prefix and is relative to the job CWD, so it may resolve to a checked-out benchmarking-platform copy rather than the deleted file — but please confirm. If it resolves to the deleted file, set +e at the top of that script means it fails silently and you lose memory-trend logging with no error.
2. (follow-up) Orphaned image config. images.yml is correctly left in place (still included by the reliability pipelines), but BENCHMARK_IMAGE_AMD64/ARM64 and the *-benchmarks entries in .gitlab/scripts/check-image-updates.sh and rebuild-images.sh now build/track images no benchmark job consumes. Not breaking; worth a cleanup pass.
Non-blocking overall — I'd approve after confirming #1.
- Rename download-s3-reports.sh → download-bp-reports.sh (uses GitLab CI API, not S3) - Fix useless cat pipe; use stdin redirect instead - Add pagination loop for BP jobs API (handles >100 jobs) - Document BP artifact layout contract at unzip call - Update PR description to remove stale S3 reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@rkennke Regarding mem_watch.sh deletion - it was indeed overzealous, the liveness detection failed. The script needs to still be there. Restored from backup. |
What does this PR do?:
Replaces the broken inline benchmark jobs with a benchmarking-platform bridge trigger. The old jobs ran the profiler inline via shell scripts and uploaded results to S3 themselves; the new approach delegates entirely to the BP pipeline at
DataDog/apm-reliability/benchmarking-platform@java-profiler.Motivation:
The existing benchmark CI was broken and unmaintained. The benchmarking-platform provides a reliable, standardised infra for running and tracking reliability benchmarks.
Additional Notes:
benchmarks-triggeris a GitLab bridge job — it cannot appear inneeds:of downstream jobs, sopost-benchmarks-pr-commentandpublish-benchmark-gh-pagesnow run in apost-benchmarksstage ordered afterbenchmarks.download-bp-reports.sh(renamed fromdownload-s3-reports.sh) fetches BP job artifacts via the GitLab CI API — no AWS CLI or S3 access required.images.ymlinclude is removed — the BP side manages its own Docker image build.jb/bench-memory-limit-fixis superseded by this PR (the memory-heavy aarch64 jobs are gone).DataDog/benchmarking-platformPR Run tests on musl aarch64 #190.How to test the change?:
Push to a branch and verify the GitLab pipeline shows a
benchmarks-triggerbridge job that fires the BP downstream pipeline. Thepost-benchmarks-pr-commentandpublish-benchmark-gh-pagesjobs should run in thepost-benchmarksstage after the trigger completes.For Datadog employees: