Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

fix(ci): migrate benchmarks to benchmarking-platform trigger#607

Open
jbachorik wants to merge 34 commits into
mainDataDog/java-profiler:mainfrom
jb/bench-bp-triggerDataDog/java-profiler:jb/bench-bp-triggerCopy head branch name to clipboard
Open

fix(ci): migrate benchmarks to benchmarking-platform trigger#607
jbachorik wants to merge 34 commits into
mainDataDog/java-profiler:mainfrom
jb/bench-bp-triggerDataDog/java-profiler:jb/bench-bp-triggerCopy head branch name to clipboard

Conversation

@jbachorik

@jbachorik jbachorik commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?:
Replaces the broken inline benchmark jobs with a benchmarking-platform bridge trigger. The old jobs ran the profiler inline via shell scripts and uploaded results to S3 themselves; the new approach delegates entirely to the BP pipeline at DataDog/apm-reliability/benchmarking-platform@java-profiler.

Motivation:
The existing benchmark CI was broken and unmaintained. The benchmarking-platform provides a reliable, standardised infra for running and tracking reliability benchmarks.

Additional Notes:

  • benchmarks-trigger is a GitLab bridge job — it cannot appear in needs: of downstream jobs, so post-benchmarks-pr-comment and publish-benchmark-gh-pages now run in a post-benchmarks stage ordered after benchmarks.
  • download-bp-reports.sh (renamed from download-s3-reports.sh) fetches BP job artifacts via the GitLab CI API — no AWS CLI or S3 access required.
  • The images.yml include is removed — the BP side manages its own Docker image build.
  • jb/bench-memory-limit-fix is superseded by this PR (the memory-heavy aarch64 jobs are gone).
  • The BP side implementation is in DataDog/benchmarking-platform PR Run tests on musl aarch64 #190.

How to test the change?:
Push to a branch and verify the GitLab pipeline shows a benchmarks-trigger bridge job that fires the BP downstream pipeline. The post-benchmarks-pr-comment and publish-benchmark-gh-pages jobs should run in the post-benchmarks stage after the trigger completes.

For Datadog employees:

  • This PR doesn't touch any of that.
  • JIRA: [JIRA-XXXX]

@jbachorik jbachorik added the AI label Jun 18, 2026
@datadog-datadog-prod-us1

This comment has been minimized.

@dd-octo-sts

dd-octo-sts Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #28403263029 | Commit: 4ea1142 | Duration: 13m 24s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-29 21:33:48 UTC

@dd-octo-sts

dd-octo-sts Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results

Pipeline: https://gitlab.ddbuild.io/DataDog/apm-reliability/benchmarking-platform/-/pipelines/121736231 Commit: 0a073bebc8827888d5db234656fd476574d1ba6a

⚠️ Significant outliers

  • 🔴 future-genetic (JDK 21): runtime +6.8% (2054→2194 ms)
  • ⚠️ fj-kmeans (JDK 21): wallclock unwinds skipped — latest 0 → dev 1267 (new)
  • ⚠️ future-genetic (JDK 21): wallclock unwinds skipped — latest 0 → dev 3072 (new)
Runtime details (per benchmark × JDK)
Benchmark JDK Latest Dev Δ (dev vs latest) Issues L/D
akka-uct 21 ✅ 10445 ms (7 iters) ✅ 10343 ms (7 iters) ≈ -1% (±23%) — / —
akka-uct 25 ✅ 8835 ms (8 iters) ✅ 8997 ms (8 iters) ≈ +1.8% (±18.3%) — / —
finagle-chirper 21 ✅ 5956 ms (11 iters) ✅ 6047 ms (11 iters) ≈ +1.5% (±46.8%) ⚠️ W:1 / ⚠️ W:2
finagle-chirper 25 ✅ 5460 ms (12 iters) ✅ 5444 ms (12 iters) ≈ -0.3% (±42%) ⚠️ W:1 / ⚠️ W:1
fj-kmeans 21 ✅ 2867 ms (22 iters) ✅ 2849 ms (22 iters) ≈ -0.6% (±4.2%) — / —
fj-kmeans 25 ✅ 2863 ms (22 iters) ✅ 2771 ms (23 iters) ≈ -3.2% (±4.4%) — / —
future-genetic 21 ✅ 2054 ms (30 iters) ✅ 2194 ms (28 iters) 🔴 +6.8% — / —
future-genetic 25 ✅ 2069 ms (30 iters) ✅ 2121 ms (29 iters) ≈ +2.5% (±4.5%) — / —
naive-bayes 21 ✅ 1265 ms (45 iters) ✅ 1219 ms (47 iters) ≈ -3.6% (±55.9%) — / —
naive-bayes 25 ✅ 1014 ms (56 iters) ✅ 1001 ms (57 iters) ≈ -1.3% (±55.5%) — / —
reactors 21 ✅ 17727 ms (5 iters) ✅ 16471 ms (5 iters) ≈ -7.1% (±15.8%) — / —
reactors 25 ✅ 18140 ms (5 iters) ✅ 18396 ms (5 iters) ≈ +1.4% (±4.2%) — / —
Internal counter details (ddprof)

ddprof internal counters, latest / dev (✅ = 0, · = unavailable):

Benchmark JDK Dropped rec Dropped jvmti Dropped trace Skipped WC AGCT fail Unwind fail
akka-uct 21 ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅
akka-uct 25 ✅ / ✅ ✅ / ✅ 2 / 1 2094 / 2091 ✅ / ✅ ✅ / ✅
finagle-chirper 21 ✅ / ✅ ✅ / ✅ 7 / ✅ 8792 / ✅ ✅ / ✅ ✅ / ✅
finagle-chirper 25 ✅ / ✅ ✅ / ✅ ✅ / ✅ 8693 / 8009 ✅ / ✅ ✅ / ✅
fj-kmeans 21 ✅ / ✅ ✅ / ✅ ✅ / 1 ✅ / 1267 ✅ / ✅ ✅ / ✅
fj-kmeans 25 ✅ / ✅ ✅ / ✅ 3 / 4 1293 / 1285 ✅ / ✅ ✅ / ✅
future-genetic 21 ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / 3072 ✅ / ✅ ✅ / ✅
future-genetic 25 ✅ / ✅ ✅ / ✅ 1 / 1 2931 / 2881 ✅ / ✅ ✅ / ✅
naive-bayes 21 ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅
naive-bayes 25 ✅ / ✅ ✅ / ✅ 3 / 3 3489 / 3502 ✅ / ✅ ✅ / ✅
reactors 21 ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅ ✅ / ✅
reactors 25 ✅ / ✅ ✅ / ✅ 1 / 1 1819 / 1862 ✅ / ✅ ✅ / ✅

@jbachorik jbachorik marked this pull request as ready for review June 20, 2026 08:51
@jbachorik jbachorik requested a review from a team as a code owner June 20, 2026 08:51

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc5c810c9d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .gitlab/benchmarks/.gitlab-ci.yml Outdated
Comment thread .gitlab/benchmarks/.gitlab-ci.yml
  - benchmarks-trigger: CANCELLED rule, interruptible:false, UPSTREAM_BRANCH fix, remove PREPARE_IMAGE
  - publish-benchmark-gh-pages: mirror skip rules, use $DD_OCTO_STS_IMAGE
  - download-s3-reports.sh: require downstream pipeline status==success
  - report-dd-trace-results: add resource_group:gh-pages-publish
  - async-profiler-build.ci.sts.yaml: restrict to java-profiler; restore benchmarking-platform-reports.ci.sts.yaml
  - publish-gh-pages.sh: fix wrong URL in header comment

@rkennke rkennke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the benchmark migration — direction is clearly right and the GitLab bridge mechanics (strategy: depend, interruptible: false, moving the CANCELLED gate into rules: since bridges can't have before_script, and the project-wide resource_group: gh-pages-publish to serialize force-pushes) are all handled correctly and well-commented. Inline comments below are minor/cleanup. Two findings can't be anchored inline because they touch files outside this diff:

1. (verify before merge) mem_watch.sh deletion vs. reliability check. This PR deletes .gitlab/benchmarks/steps/mem_watch.sh, but .gitlab/reliability/memory_trend_check.sh:58 still calls sh ./benchmarks/steps/mem_watch.sh .... The path differs by a .gitlab/ prefix and is relative to the job CWD, so it may resolve to a checked-out benchmarking-platform copy rather than the deleted file — but please confirm. If it resolves to the deleted file, set +e at the top of that script means it fails silently and you lose memory-trend logging with no error.

2. (follow-up) Orphaned image config. images.yml is correctly left in place (still included by the reliability pipelines), but BENCHMARK_IMAGE_AMD64/ARM64 and the *-benchmarks entries in .gitlab/scripts/check-image-updates.sh and rebuild-images.sh now build/track images no benchmark job consumes. Not breaking; worth a cleanup pass.

Non-blocking overall — I'd approve after confirming #1.

Comment thread .gitlab/benchmarks/download-bp-reports.sh
Comment thread .gitlab/benchmarks/download-s3-reports.sh Outdated
Comment thread .gitlab/benchmarks/download-s3-reports.sh Outdated
Comment thread .gitlab/benchmarks/download-bp-reports.sh
jbachorik and others added 2 commits June 29, 2026 16:04
- Rename download-s3-reports.sh → download-bp-reports.sh (uses GitLab CI API, not S3)
- Fix useless cat pipe; use stdin redirect instead
- Add pagination loop for BP jobs API (handles >100 jobs)
- Document BP artifact layout contract at unzip call
- Update PR description to remove stale S3 reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@rkennke Regarding mem_watch.sh deletion - it was indeed overzealous, the liveness detection failed. The script needs to still be there. Restored from backup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.