fix(ci): migrate benchmarks to benchmarking-platform trigger by jbachorik · Pull Request #607 · DataDog/java-profiler

jbachorik · Jun 18, 2026

What does this PR do?:
Replaces the broken inline benchmark jobs with a benchmarking-platform bridge trigger. The old jobs ran the profiler inline via shell scripts and uploaded results to S3 themselves; the new approach delegates entirely to the BP pipeline at DataDog/apm-reliability/benchmarking-platform@java-profiler.

Motivation:
The existing benchmark CI was broken and unmaintained. The benchmarking-platform provides a reliable, standardised infra for running and tracking reliability benchmarks.

Additional Notes:

benchmarks-trigger is a GitLab bridge job — it cannot appear in needs: of downstream jobs, so post-benchmarks-pr-comment and publish-benchmark-gh-pages now run in a post-benchmarks stage ordered after benchmarks.
download-bp-reports.sh (renamed from download-s3-reports.sh) fetches BP job artifacts via the GitLab CI API — no AWS CLI or S3 access required.
The images.yml include is removed — the BP side manages its own Docker image build.
jb/bench-memory-limit-fix is superseded by this PR (the memory-heavy aarch64 jobs are gone).
The BP side implementation is in DataDog/benchmarking-platform PR Run tests on musl aarch64 #190.

How to test the change?:
Push to a branch and verify the GitLab pipeline shows a benchmarks-trigger bridge job that fires the BP downstream pipeline. The post-benchmarks-pr-comment and publish-benchmark-gh-pages jobs should run in the post-benchmarks stage after the trigger completes.

For Datadog employees:

This PR doesn't touch any of that.
JIRA: [JIRA-XXXX]

…ost-benchmark jobs

dd-octo-sts · Jun 18, 2026

CI Test Results

Run: #28403263029 | Commit: 4ea1142 | Duration: 13m 24s (longest job)

✅ All 32 test jobs passed

Status Overview

JDK	glibc-aarch64/debug	glibc-amd64/debug	musl-aarch64/debug	musl-amd64/debug
8	-	✅	-	-
8-ibm	-	✅	-	-
8-j9	✅	✅	-	-
8-librca	-	-	✅	✅
8-orcl	-	✅	-	-
11	-	✅	-	-
11-j9	✅	✅	-	-
11-librca	-	-	✅	✅
17	✅	✅	-	-
17-graal	✅	✅	-	-
17-j9	✅	✅	-	-
17-librca	-	-	✅	✅
21	✅	✅	-	-
21-graal	✅	✅	-	-
21-librca	-	-	✅	✅
25	✅	✅	-	-
25-graal	✅	✅	-	-
25-librca	-	-	✅	✅

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0

Updated: 2026-06-29 21:33:48 UTC

…comment

…needs

dd-octo-sts · Jun 19, 2026

Benchmark Results

Pipeline: https://gitlab.ddbuild.io/DataDog/apm-reliability/benchmarking-platform/-/pipelines/121736231 Commit: 0a073bebc8827888d5db234656fd476574d1ba6a

⚠️ Significant outliers

🔴 future-genetic (JDK 21): runtime +6.8% (2054→2194 ms)
⚠️ fj-kmeans (JDK 21): wallclock unwinds skipped — latest 0 → dev 1267 (new)
⚠️ future-genetic (JDK 21): wallclock unwinds skipped — latest 0 → dev 3072 (new)

Runtime details (per benchmark × JDK)

Benchmark	JDK	Latest	Dev	Δ (dev vs latest)	Issues L/D
akka-uct	21	✅ 10445 ms (7 iters)	✅ 10343 ms (7 iters)	≈ -1% (±23%)	— / —
akka-uct	25	✅ 8835 ms (8 iters)	✅ 8997 ms (8 iters)	≈ +1.8% (±18.3%)	— / —
finagle-chirper	21	✅ 5956 ms (11 iters)	✅ 6047 ms (11 iters)	≈ +1.5% (±46.8%)	⚠️ W:1 / ⚠️ W:2
finagle-chirper	25	✅ 5460 ms (12 iters)	✅ 5444 ms (12 iters)	≈ -0.3% (±42%)	⚠️ W:1 / ⚠️ W:1
fj-kmeans	21	✅ 2867 ms (22 iters)	✅ 2849 ms (22 iters)	≈ -0.6% (±4.2%)	— / —
fj-kmeans	25	✅ 2863 ms (22 iters)	✅ 2771 ms (23 iters)	≈ -3.2% (±4.4%)	— / —
future-genetic	21	✅ 2054 ms (30 iters)	✅ 2194 ms (28 iters)	🔴 +6.8%	— / —
future-genetic	25	✅ 2069 ms (30 iters)	✅ 2121 ms (29 iters)	≈ +2.5% (±4.5%)	— / —
naive-bayes	21	✅ 1265 ms (45 iters)	✅ 1219 ms (47 iters)	≈ -3.6% (±55.9%)	— / —
naive-bayes	25	✅ 1014 ms (56 iters)	✅ 1001 ms (57 iters)	≈ -1.3% (±55.5%)	— / —
reactors	21	✅ 17727 ms (5 iters)	✅ 16471 ms (5 iters)	≈ -7.1% (±15.8%)	— / —
reactors	25	✅ 18140 ms (5 iters)	✅ 18396 ms (5 iters)	≈ +1.4% (±4.2%)	— / —

Internal counter details (ddprof)

ddprof internal counters, latest / dev (✅ = 0, · = unavailable):

Benchmark	JDK	Dropped rec	Dropped jvmti	Dropped trace	Skipped WC	AGCT fail	Unwind fail
akka-uct	21	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅
akka-uct	25	✅ / ✅	✅ / ✅	2 / 1	2094 / 2091	✅ / ✅	✅ / ✅
finagle-chirper	21	✅ / ✅	✅ / ✅	7 / ✅	8792 / ✅	✅ / ✅	✅ / ✅
finagle-chirper	25	✅ / ✅	✅ / ✅	✅ / ✅	8693 / 8009	✅ / ✅	✅ / ✅
fj-kmeans	21	✅ / ✅	✅ / ✅	✅ / 1	✅ / 1267	✅ / ✅	✅ / ✅
fj-kmeans	25	✅ / ✅	✅ / ✅	3 / 4	1293 / 1285	✅ / ✅	✅ / ✅
future-genetic	21	✅ / ✅	✅ / ✅	✅ / ✅	✅ / 3072	✅ / ✅	✅ / ✅
future-genetic	25	✅ / ✅	✅ / ✅	1 / 1	2931 / 2881	✅ / ✅	✅ / ✅
naive-bayes	21	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅
naive-bayes	25	✅ / ✅	✅ / ✅	3 / 3	3489 / 3502	✅ / ✅	✅ / ✅
reactors	21	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅	✅ / ✅
reactors	25	✅ / ✅	✅ / ✅	1 / 1	1819 / 1862	✅ / ✅	✅ / ✅

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc5c810c9d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- benchmarks-trigger: CANCELLED rule, interruptible:false, UPSTREAM_BRANCH fix, remove PREPARE_IMAGE - publish-benchmark-gh-pages: mirror skip rules, use $DD_OCTO_STS_IMAGE - download-s3-reports.sh: require downstream pipeline status==success - report-dd-trace-results: add resource_group:gh-pages-publish - async-profiler-build.ci.sts.yaml: restrict to java-profiler; restore benchmarking-platform-reports.ci.sts.yaml - publish-gh-pages.sh: fix wrong URL in header comment

rkennke

Reviewed the benchmark migration — direction is clearly right and the GitLab bridge mechanics (strategy: depend, interruptible: false, moving the CANCELLED gate into rules: since bridges can't have before_script, and the project-wide resource_group: gh-pages-publish to serialize force-pushes) are all handled correctly and well-commented. Inline comments below are minor/cleanup. Two findings can't be anchored inline because they touch files outside this diff:

1. (verify before merge) mem_watch.sh deletion vs. reliability check. This PR deletes .gitlab/benchmarks/steps/mem_watch.sh, but .gitlab/reliability/memory_trend_check.sh:58 still calls sh ./benchmarks/steps/mem_watch.sh .... The path differs by a .gitlab/ prefix and is relative to the job CWD, so it may resolve to a checked-out benchmarking-platform copy rather than the deleted file — but please confirm. If it resolves to the deleted file, set +e at the top of that script means it fails silently and you lose memory-trend logging with no error.

2. (follow-up) Orphaned image config. images.yml is correctly left in place (still included by the reliability pipelines), but BENCHMARK_IMAGE_AMD64/ARM64 and the *-benchmarks entries in .gitlab/scripts/check-image-updates.sh and rebuild-images.sh now build/track images no benchmark job consumes. Not breaking; worth a cleanup pass.

Non-blocking overall — I'd approve after confirming #1.

- Rename download-s3-reports.sh → download-bp-reports.sh (uses GitLab CI API, not S3) - Fix useless cat pipe; use stdin redirect instead - Add pagination loop for BP jobs API (handles >100 jobs) - Document BP artifact layout contract at unzip call - Update PR description to remove stale S3 reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbachorik · Jun 29, 2026

@rkennke Regarding mem_watch.sh deletion - it was indeed overzealous, the liveness detection failed. The script needs to still be there. Restored from backup.

fix(ci): migrate benchmarks to benchmarking-platform trigger

3fc25e8

jbachorik added the AI label Jun 18, 2026

This comment has been minimized.

Sign in to view

jbachorik added 5 commits June 18, 2026 21:50

feat(ci): add reliability table to benchmark PR comment

2be4d5a

feat(ci): add delta%, upload count, issues to reliability table

ba307c6

fix(ci): use boto3 for S3 download; add benchmarking-platform SA to p…

e527a2f

…ost-benchmark jobs

fix(ci): fetch BP artifacts via GitLab API instead of aws s3 sync

200d347

fix(ci): diagnose API errors explicitly, no silent set -e kill

5f658b0

jbachorik added 18 commits June 19, 2026 08:56

fix: pass MAVEN_REPOSITORY_PROXY to BP trigger

7580df2

fix: extend octo-sts trust to BP; wait for benchmarks-trigger before …

7d9f679

…comment

fix: remove post-benchmarks-pr-comment (BP handles it); fix gh-pages …

62f834f

…needs

chore: remove MAVEN_REPOSITORY_PROXY trigger var (not needed)

4a791f7

fix: restore MAVEN_REPOSITORY_PROXY trigger var

321cf9d

ci: retrigger to pick up BP curl fix

b64d1f1

fix: remove MAVEN_REPOSITORY_PROXY from trigger vars (BP owns it now)

c8e0b58

ci: trigger benchmark pipeline

05f1c4c

ci: trigger benchmark pipeline

2d51776

ci: trigger benchmark pipeline

2ca358c

ci: trigger benchmark pipeline

0da28e9

ci: trigger benchmark pipeline

8a9ea3a

ci: trigger benchmark pipeline

827f069

ci: trigger benchmark pipeline

a6983ee

ci: re-trigger BP renaissance run (result JSON corruption fix)

b62c08c

ci: trigger BP renaissance run

8f03d10

ci: trigger BP renaissance run (pr-commenter)

5be53e4

ci: trigger BP renaissance run (octo-sts pr:write)

81073a7

jbachorik added 2 commits June 20, 2026 08:50

ci: re-trigger BP renaissance run (fix iteration grep)

dc0947e

ci: re-trigger BP renaissance run (iters fix + JFR counters)

2876475

jbachorik added 2 commits June 20, 2026 10:39

ci: re-trigger BP run (noise-aware deltas, outliers summary)

710bd5e

ci: re-trigger BP run (counter ratio thresholds)

bc5c810

jbachorik marked this pull request as ready for review June 20, 2026 08:51

jbachorik requested a review from a team as a code owner June 20, 2026 08:51

chatgpt-codex-connector Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread .gitlab/benchmarks/.gitlab-ci.yml Outdated

Comment thread .gitlab/benchmarks/.gitlab-ci.yml

jbachorik added 3 commits June 20, 2026 11:44

ci: remove local benchmark steps superseded by benchmarking-platform

1b03a1c

Merge branch 'main' into jb/bench-bp-trigger

9bcf8fa

rkennke reviewed Jun 29, 2026

View reviewed changes

Comment thread .gitlab/benchmarks/download-bp-reports.sh

Comment thread .gitlab/benchmarks/download-s3-reports.sh Outdated

Comment thread .gitlab/benchmarks/download-s3-reports.sh Outdated

Comment thread .gitlab/benchmarks/download-bp-reports.sh

jbachorik and others added 2 commits June 29, 2026 16:04

fix(ci): restore mem_watch.sh needed by reliability pipeline

bd1bbff

Merge branch 'main' into jb/bench-bp-trigger

0a073be

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ci): migrate benchmarks to benchmarking-platform trigger#607

fix(ci): migrate benchmarks to benchmarking-platform trigger#607
jbachorik wants to merge 34 commits into
mainDataDog/java-profiler:mainfrom
jb/bench-bp-triggerDataDog/java-profiler:jb/bench-bp-triggerCopy head branch name to clipboard

jbachorik commented Jun 18, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

dd-octo-sts Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

dd-octo-sts Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

rkennke left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbachorik commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Search code, repositories, users, issues, pull requests...

Uh oh!

Conversation

jbachorik commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

dd-octo-sts Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Test Results

Status Overview

Uh oh!

dd-octo-sts Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

⚠️ Significant outliers

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

rkennke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbachorik commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jbachorik commented Jun 18, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 18, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 19, 2026 •

edited

Loading