Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

fix(profiler): guard JVMTI sample path against null calltrace buffer#536

Merged
jbachorik merged 7 commits into
mainDataDog/java-profiler:mainfrom
jb/prof-14679-jvmti-buffer-null-guardDataDog/java-profiler:jb/prof-14679-jvmti-buffer-null-guardCopy head branch name to clipboard
May 22, 2026
Merged

fix(profiler): guard JVMTI sample path against null calltrace buffer#536
jbachorik merged 7 commits into
mainDataDog/java-profiler:mainfrom
jb/prof-14679-jvmti-buffer-null-guardDataDog/java-profiler:jb/prof-14679-jvmti-buffer-null-guardCopy head branch name to clipboard

Conversation

@jbachorik

@jbachorik jbachorik commented May 20, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?:

Defensive hardening of the JVMTI sampled-object-alloc path in Profiler::recordJVMTISample:

  1. After the per-shard tryLock succeeds, load _calltrace_buffer[lock_index] into a local and drop the sample if it is nullptr instead of dereferencing it (ddprof-lib/src/main/cpp/profiler.cpp:457-466).
  2. In Profiler::start(), serialise buffer reallocation against signal-handler readers by allocating the replacement first, then swapping the slot under the matching _locks[i], and only freeing the old buffer once the lock is released (ddprof-lib/src/main/cpp/profiler.cpp:1115-1133). A calloc failure no longer leaves the slot pointing at freed memory.

Motivation:

Single observed SIGSEGV null-deref crash reported — Profiler::recordJVMTISample called from JvmtiExport::post_sampled_object_alloc, fault address 0x0, JDK 25.0.3 / amd64 / Linux, 1 event / 4 days / 1 service.

The data is too thin to confirm a single root cause — these are best-effort defensive measures, not a verified fix. They close the cheapest plausible window without changing hot-path behaviour or adding any allocations.

Triage notes posted on the JIRA ticket.

Triage summary

Walking the function for derefs reachable from the top frame, the closest match to a fault at 0x0 is _calltrace_buffer[lock_index]->_asgct_frames at profiler.cpp:461-462 (now 465-466): _asgct_frames is a union member at offset 0 of CallTraceBuffer, so a null _calltrace_buffer[lock_index] produces frames == nullptr, and the subsequent GetStackTrace(..., jvmti_frames, ...) write traps at 0x0.

For the slot to be observed null while ObjectSampler::_active = true, the most plausible window is Profiler::start() reallocation at profiler.cpp:1106-1114: free(_calltrace_buffer[i]) followed by calloc, with no per-shard lock held by start and no nulling between the two calls. A tryLock-based reader (the JVMTI sample path) does not synchronise against this writer, and a calloc failure path additionally leaves the slot pointing at freed memory.

VM::jvmti() and Profiler::instance() are statics that cannot become null at runtime; FlightRecorder::recordEvent already null-checks _rec under a shared lock. Inlining of Recording::recordAllocation writes into the same top frame remains a theoretical possibility but no obvious unguarded null was found there on inspection.

Additional Notes:

  • Both changes are local and allocation-free on the hot path. The reader path adds one extra load and one branch (predictable: the slot is non-null in steady state). The writer adds a per-shard lock acquisition during Profiler::start() only.
  • The reader uses a plain load: the writer publishes the pointer with the spin-lock's release barrier, and the reader's tryLock provides the matching acquire barrier.
  • The change does not touch the other two consumers of _calltrace_buffer[lock_index] (recordSample at line ~559, recordExternalSample at line ~642). The writer-side serialisation under _locks[i] benefits them as well, but their explicit null-check is intentionally out of scope here.
  • No new tests: the failure mode is a narrow start/stop/realloc race that cannot be exercised reliably from the existing gtests without introducing test-only hooks. Existing :ddprof-lib:gtestDebug suite still passes locally.

How to test the change?:

  • :ddprof-lib:compileRelease succeeds locally — verified
  • :ddprof-lib:gtestDebug passes locally — verified (8/8 in UtilsTest)
  • CI green on Linux x86_64 / aarch64 and musl variants
  • No regression on existing ObjectSampler JVMTI tests

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-14679

🤖 Generated with Claude Code

…(PROF-14679)

Defensive hardening for a single observed SIGSEGV null-deref in
Profiler::recordJVMTISample from JvmtiExport::post_sampled_object_alloc.

* Null-check _calltrace_buffer[lock_index] after acquiring the per-shard
  lock; drop the sample rather than dereferencing.
* Serialise buffer reallocation in Profiler::start() against signal-handler
  readers by swapping under the per-shard lock; allocate the replacement
  first so a calloc failure no longer leaves the slot pointing at freed
  memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik added the AI label May 20, 2026
@jbachorik jbachorik requested a review from Copilot May 20, 2026 11:54
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Defensively hardens the native profiler’s JVMTI sampled-object-allocation path by preventing null dereferences of per-shard calltrace buffers and making calltrace buffer (re)allocation in Profiler::start() safe against concurrent readers.

Changes:

  • In Profiler::recordJVMTISample, load the per-shard _calltrace_buffer slot into a local and drop the sample if it is nullptr.
  • In Profiler::start(), allocate replacement calltrace buffers first, then swap them into _calltrace_buffer[i] under the corresponding shard lock, and only free the old buffer after releasing the lock.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated
@dd-octo-sts

dd-octo-sts Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #26279827602 | Commit: 89f95b5 | Duration: 32m 19s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-05-22 10:35:06 UTC

…iants

- ddprof-lib/src/main/cpp/profiler.cpp:1128 — replace misleading
  "no reader can be using it" claim with accurate two-path explanation:
  tryLock-based readers protected by shard lock; lock-free readers
  (recordExternalSample) safe because swap precedes enableEngines()

Co-Authored-By: muse <muse@noreply>
@jbachorik

Copy link
Copy Markdown
Collaborator Author

@copilot review

Copilot AI commented May 20, 2026

Copy link
Copy Markdown
Contributor

@copilot review

Reviewed the two commits. Here's my assessment:

recordJVMTISample null check (lines 461–466)
Correct. The buffer pointer is loaded after tryLock() succeeds, which provides the acquire barrier that pairs with the release from the writer's unlock(). The null check is reachable only if calloc fails inside start() — at which point start() returns an error and engines are never re-enabled, so the steady-state path never hits the guard.

Lock-protected swap in start() (lines 1118–1133)
Correct. The allocate-before-free order prevents the freed-pointer window from the original code. The spin lock's barriers (__sync_bool_compare_and_swap / __sync_fetch_and_sub are full barriers on all supported architectures) ensure that readers see the new pointer atomically after unlock().

recordSample at line 568 — no explicit null check
Implicitly safe: the tryLock succeeds before reaching line 568, so the writer's _locks[i].lock() in start() serializes against it. The null-on-calloc-failure case can't reach line 568 because start() returns an error before enableEngines() is called.

recordExternalSample pre-lock dereference at line 651
The safety claim in the comment ("swap completes before enableEngines() is called") is accurate: recordExternalSample is called only from the J9 wall-clock engine (j9WallClock.cpp), which is gated by enableEngines()/disableEngines(). No engine is active during the reallocation window.

Comment accuracy
The second commit's two-path explanation is precise and no longer misleading.

No correctness issues found. The scope is appropriately narrow — the two other callers of _calltrace_buffer[lock_index] remain consistent with the new writer-side serialisation.

@datadog-prod-us1-4

This comment has been minimized.

@jbachorik jbachorik marked this pull request as ready for review May 20, 2026 14:47
@jbachorik jbachorik requested a review from a team as a code owner May 20, 2026 14:47
@jbachorik jbachorik requested a review from rkennke May 20, 2026 14:47
@jbachorik jbachorik changed the title fix(profiler): guard JVMTI sample path against null calltrace buffer (PROF-14679) fix(profiler): guard JVMTI sample path against null calltrace buffer May 20, 2026

@kaahos kaahos left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@jbachorik jbachorik marked this pull request as draft May 21, 2026 15:36
Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated
jbachorik and others added 2 commits May 22, 2026 08:45
- ddprof-lib/src/main/cpp/profiler.cpp:698 — restructure recordExternalSample to acquire tryLock before reading _calltrace_buffer, add null-guard matching recordJVMTISample pattern, update misleading 'lock-free readers' comment in Profiler::start()

Co-Authored-By: muse <muse@noreply>
- profiler_null_calltrace_buffer_ut: crash regression tests for recordJVMTISample
  and recordExternalSample against a pre-start (null buffer) singleton
- objectSampler_ut: Deallocate guard tests (T-01..T-05) with mock-JVMTI
  infrastructure; extract runAndExpect helper, drop thread_local/dead guard
- build.gradle.kts: failFast=true so crashing test binaries fail the build
- profiler.cpp: remove duplicate comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik marked this pull request as ready for review May 22, 2026 07:56
jbachorik and others added 2 commits May 22, 2026 11:18
Signal dispositions are process-wide. Concurrent save/restore from
20 worker threads races: one thread restoring SIG_DFL while another
calls pthread_kill(self, SIGVTALRM) kills the process (exit 154).

Move signal setup to the test body (single-threaded, before workers
spawn), remove SigGuard from t05_worker. Mirrors the pattern T-09
already uses correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- flightRecorder: call flushIfNeeded before addThread in recordEventDelegated
- profiler.h: align _failures to cache line; clarify _sample_seq comment
- vmEntry.h: clarify _request_stack_trace vs _init_request_stack_trace comment
- wallClock: deduplicate inSyscall into BaseWallClock; handle EAGAIN in timerLoop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread ddprof-lib/src/main/cpp/profiler.cpp
@jbachorik jbachorik merged commit 980bc8d into main May 22, 2026
98 checks passed
@jbachorik jbachorik deleted the jb/prof-14679-jvmti-buffer-null-guard branch May 22, 2026 15:01
@github-actions github-actions Bot added this to the 1.43.0 milestone May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.