Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Implement option to use JVMTI/JFR stack-walker for CPU- and wall-clock-profiler#504

Merged
rkennke merged 1 commit into
mainDataDog/java-profiler:mainfrom
rkennke/PROF-14350DataDog/java-profiler:rkennke/PROF-14350Copy head branch name to clipboard
May 19, 2026
Merged

Implement option to use JVMTI/JFR stack-walker for CPU- and wall-clock-profiler#504
rkennke merged 1 commit into
mainDataDog/java-profiler:mainfrom
rkennke/PROF-14350DataDog/java-profiler:rkennke/PROF-14350Copy head branch name to clipboard

Conversation

@rkennke

@rkennke rkennke commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?:

We are working to contribute a JVMTI/JFR-based stack-walker into upstream JDK (see JEP Draft: Native Profiler Hook for Unbiased Stack Traces (Experimental)). This feature adds a JVMTI extension to trigger a stack-trace to be emitted via the JVM's JFR. This PR wires that feature to our CPU and wall-clock profilers. When the feature is enabled, and the JVMTI extension is detected, it will prefer that over the other stack-walkers.

Motivation:

I would like to be able to test the new JDK feature end-to-end. Eventually, when the feature is released in upstream JDK, it will be ready to go on our side.

Additional Notes:

The feature needs to be enabled by the jvmtistacks option.

The stack-traces will be emitted via the JVM's JFR stream. We merge the JVM JFR with the agent's JFR, the backend will receive the combined JFR. I'll post a separate PR in the backend which implements the necessary post-processing.

I also made a small unrelated addition to vmStructs to deal with removed uncompressed class-pointers support in JDK 27.

How to test the change?:

I manually verified that the CPU and wall-clock profiler produce the desired AsyncStackTrace events.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-14350

Unsure? Have a question? Request a review!

Comment thread ddprof-lib/src/main/cpp/vmEntry.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/hotspot/vmStructs.h
@rkennke rkennke marked this pull request as ready for review April 28, 2026 11:15
@rkennke rkennke requested a review from a team as a code owner April 28, 2026 11:15

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57ab35fed9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/vmEntry.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/itimer.cpp Outdated
@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:609

I think Error::OK is 0, so the condition is correct. But it might be good to use explicit comparison against Error::OK for posterity

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/vmEntry.h:804

Data Race Uninitialized Pointers

Plain non-atomic reads of _request_stack_trace and _request_stack_trace_initialized in canRequestStackTrace() race with initialization writes in probeJFRRequestStackTrace(). Signal handlers may see partial initialization on other CPUs due to lack of memory ordering. CWE-131 pattern: pointer_race_no_sync.

Use __atomic_load_n for both _request_stack_trace and _request_stack_trace_initialized with __ATOMIC_ACQUIRE semantics in canRequestStackTrace(). Initialize with __atomic_store_n in probeJFRRequestStackTrace().

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.h:99

False Sharing

_sample_seq is placed adjacent to _total_samples (both write-mostly atomics incremented on every sample), sharing a single cache line. This doubles coherence traffic on multi-socket/many-core hardware.

Separate with explicit padding (alignas(DEFAULT_CACHE_LINE_SIZE) on _sample_seq in profiler.h:99) or co-locate _sample_seq with a read-mostly field so the dirtied line is not also being read by other paths.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Contended Atomic

_sample_seq is incremented with atomicInc (full barrier __sync_fetch_and_add) on every CPU/wall sample from every thread, creating a single globally-contended cache line in the sample hot path. Combined with adjacent _total_samples and adjacent write-mostly operations, this doubles coherence traffic on multi-socket/many-core hardware.

Use atomicIncRelaxed(_sample_seq) (or __atomic_fetch_add(..., __ATOMIC_RELAXED)) like _total_samples; the value is opaque to the JVM and only needs uniqueness, not synchronization. Separate with explicit padding (alignas(DEFAULT_CACHE_LINE_SIZE) on _sample_seq in profiler.h:99) to avoid false-sharing with _total_samples.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Integer Overflow

Correlation ID generation via atomicInc(_sample_seq) + 1 can wrap at u64 boundaries; when _sample_seq reaches ULLONG_MAX, (ULLONG_MAX + 1) overflows to zero, violating the wire-format contract where zero means 'no correlation'. CWE-190. This becomes more severe when profiler restarts and resets _sample_seq to 0 (line 1067), potentially reusing correlation IDs that match stale AsyncStackTrace events from prior JFR recordings (replay attack vector).

Implement saturating increment: u64 seq = atomicIncRelaxed(_sample_seq); u64 corr = (seq == ULLONG_MAX) ? 1 : (seq + 1); to ensure correlation_id is never zero. Alternatively prevent zero by saturating/modular arithmetic on restart.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:1032

Misleading Comment

The new comment block above selectWallEngine's jvmtistacks branch states 'The engine-level _wallclock_sampler knob still takes precedence for users who explicitly request JVMTI/ASGCT', but the code returns wall_jvmti_engine before the switch(args._wallclock_sampler) is even reached, so jvmtistacks in fact OVERRIDES the knob. Documentation contradicts behavior.

Rewrite comment to: 'jvmtistacks overrides _wallclock_sampler when the HotSpot extension is available'.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/wallClock.cpp:878

Signal Safety Errno

WallClockJvmti::signalHandler does not save/restore errno across the body, unlike the sibling CTimerJvmti and ITimerJvmti handlers added in the same diff. Calls into ProfiledThread, OS::threadId, and recordSampleDelegated may set errno and clobber the interrupted thread's errno value.

Bracket the body with int saved_errno = errno; ...; errno = saved_errno; matching the pattern used in CTimerJvmti/ITimerJvmti.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/vmEntry.cpp:765

Null Pointer Dereference

probeJFRRequestStackTrace() reads args._jvmtistacks (line 763) and calls _init_request_stack_trace->() (line 765) without checking if the InitializeRequestStackTrace function pointer was actually set. If the extension is not found in GetExtensionFunctions, _init_request_stack_trace remains nullptr, causing a null deref.

Add an explicit null check before calling _init_request_stack_trace: 'if (_init_request_stack_trace == nullptr) { Log::warn(...); Counters::increment(JVMTI_STACKS_INIT_FAILED); return; }' after line 764.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/ctimer_linux.cpp:187

Duplication And Lsp

CTimerJvmti::start duplicates CTimer::start almost verbatim (max_timers/_timers calloc, registerThread loop) but skips OS::primeSignalOriginCheck() and origin validation that CTimer::start/signalHandler perform. Without origin validation, CTimerJvmti::signalHandler processes every SIGPROF without verifying it originates from setitimer(ITIMER_PROF); foreign SIGPROF (e.g. from raise/pthread_kill) will trigger unintended JVMTI stack-trace requests.

Either reuse CTimer::start (e.g. by parameterizing the signal handler) or add OS::primeSignalOriginCheck() call in CTimerJvmti::start() and OS::shouldProcessSignal guard in CTimerJvmti::signalHandler.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/ctimer_linux.cpp:226

Silent Fail

CTimerJvmti::start() returns Error::OK unconditionally, discarding thread-registration failures from lines 219-222. Swallows per-thread registerThread failures and reports success even when no threads were actually registered.

Change line 226 to return result; instead of return Error::OK;

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/vmEntry.cpp:760

Resource Leak

GetExtensionFunctions returns a table where each jvmtiExtensionFunctionInfo entry owns sub-allocations (id, short_description, params array, and each param's name); only the top-level table is Deallocated, leaking every per-entry allocation for the lifetime of the JVM.

Iterate ext_functions before deallocating and call _jvmti->Deallocate on each entry's id, short_description, params[j].name for j in [0,param_count), and params itself, then Deallocate(ext_functions).

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/arguments.cpp:364

Argument Parsing Fallthrough

jvmtistacks argument parser has a logic bug: case 'f' falls through to 'default' which sets true, but comments say 'false'. All non-matched characters implicitly set true.

Add explicit break after case 'f': case 'f': _jvmtistacks = false; break;. Replace default with explicit error handling for invalid values.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Lock Contention

recordSampleDelegated attempts tryLock on three different lock indices (582-586) before falling back. If all three fail, the sample is dropped ('dangling AsyncStackTrace'). Under high lock contention, silently loses correlation between profiler events and JFR stack traces.

Add a counter (similar to JVMTI_STACKS_FAILED_OTHER) to track dropped samples due to lock contention, and surface this in the stop() diagnostic warnings.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Return Value Semantics

recordSampleDelegated returns true even when the lock could not be acquired and the sample event was dropped (only the dangling JVM-side stack-trace request remains). The header comment says 'true if the request was accepted by the JVM ... and the sample event was recorded' — the sample event was NOT recorded in that branch, so callers cannot rely on true meaning what the contract says.

Either return false on the lock-skip path so callers can count it as a drop, or update the header doc to: 'true if the JVM accepted the stack-trace request; the local sample event may still have been dropped due to lock contention'.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:641

Documentation Accuracy

Error message at line 675 suggests JFR flag syntax '-XX:StartFlightRecording=...,+jdk.AsyncStackTrace#enabled=true' but standard JFR syntax uses '.enabled=true' not '#enabled=true'.

Replace '+jdk.AsyncStackTrace#enabled=true' with 'jdk.AsyncStackTrace.enabled=true'.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/itimer.cpp:131

Maybe add __atomic_store_n(&_enabled, false, __ATOMIC_RELEASE); at the top of ITimerJvmti::stop before disarming the timer to close the in-flight signal window.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/wallClock.cpp:843

NIT: Add a runtime check in WallClockJvmti::initialize() or add a comment stating that initialization must be guarded by the caller checking VM::canRequestStackTrace() first.

@jbachorik

Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:1045

Initialization

_sample_seq is reset to 0 on each profiler start/reset. No test verifies that the reset does not cause correlation ID collisions if profiler is stopped and restarted with existing JFR recordings.

Add integration test or comment explaining that stale AsyncStackTrace events from prior recording are expected.

@rkennke

rkennke commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

ddprof-lib/src/main/cpp/profiler.cpp:609

I think Error::OK is 0, so the condition is correct. But it might be good to use explicit comparison against Error::OK for posterity

It is more complicated than that. We cannot compare with another Error::OK, because the Error type does not have such a == operator. It does have a operator bool() and that is what those if statements are using. I would also prefer to use an == Error::OK pattern, but this requires to add such operator, and it would then be preferable to change this across the code-base.

@rkennke

rkennke commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

ddprof-lib/src/main/cpp/profiler.cpp:597

Integer Overflow

Correlation ID generation via atomicInc(_sample_seq) + 1 can wrap at u64 boundaries; when _sample_seq reaches ULLONG_MAX, (ULLONG_MAX + 1) overflows to zero, violating the wire-format contract where zero means 'no correlation'. CWE-190. This becomes more severe when profiler restarts and resets _sample_seq to 0 (line 1067), potentially reusing correlation IDs that match stale AsyncStackTrace events from prior JFR recordings (replay attack vector).

Implement saturating increment: u64 seq = atomicIncRelaxed(_sample_seq); u64 corr = (seq == ULLONG_MAX) ? 1 : (seq + 1); to ensure correlation_id is never zero. Alternatively prevent zero by saturating/modular arithmetic on restart.

correlationId=0 is a perfectly fine value, there is no wire-format where zero means no correlationId. When there is no deferred stack-trace, the correlationId attribute itself is absent. I'll change the misleading comment.

@rkennke

rkennke commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

ddprof-lib/src/main/cpp/ctimer_linux.cpp:226

Silent Fail

CTimerJvmti::start() returns Error::OK unconditionally, discarding thread-registration failures from lines 219-222. Swallows per-thread registerThread failures and reports success even when no threads were actually registered.

Change line 226 to return result; instead of return Error::OK;

This looks like a pre-existing bug in CTimer::start() as well. I'm going to do the fix here.

@rkennke

rkennke commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

ddprof-lib/src/main/cpp/itimer.cpp:131

Maybe add __atomic_store_n(&_enabled, false, __ATOMIC_RELEASE); at the top of ITimerJvmti::stop before disarming the timer to close the in-flight signal window.

I believe _enabled is already cleared by disableEngines() before calling stop(), so this is not needed.

@rkennke

rkennke commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

I addressed all review comments, please re-review. Thanks!

@dd-octo-sts

dd-octo-sts Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #26099328213 | Commit: d985d50 | Duration: 32m 11s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-05-19 13:45:34 UTC

@rkennke rkennke requested a review from jbachorik May 18, 2026 09:09
Comment thread ddprof-lib/src/main/cpp/vmEntry.h
Comment thread ddprof-lib/src/main/cpp/wallClock.cpp
Comment thread ddprof-lib/src/main/cpp/itimer.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/flightRecorder.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/profiler.cpp
@rkennke rkennke requested a review from jbachorik May 19, 2026 09:45
@datadog-datadog-prod-us1-2

This comment has been minimized.

@jbachorik jbachorik left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - pending the failing CI runs

…(PROF-14350)

Adds a new jvmtistacks profiler option that, when enabled, delegates CPU
and wall-clock stack-trace collection to HotSpot's JFR RequestStackTrace
JVMTI extension (available in JDK 27+). This allows the profiler to
leverage the JVM's own safepoint-based stack walker instead of ASGCT,
which improves reliability in cases where ASGCT cannot walk the stack.

Engine selection:
- CPU: prefers CTimerJvmti (per-thread timers) → ITimerJvmti (setitimer)
  → falls back to ASGCT engines with a warning if neither is available
- Wall-clock: uses WallClockJvmti when the extension is present

Signal safety and correctness fixes included:
- Guard against null fn pointer in VM::requestStackTrace()
- Restore errno on early return in WallClockJvmti::signalHandler
- Save/restore SIGPROF handler in ITimerJvmti::check() probe
- Align recordEventDelegated() with recordEvent() semantics (break not return)
- Remove no-op initializeRequestStackTrace() retry in Profiler::start()
- Treat per-thread CTimer registration failures as benign during start()
  (threads may exit between listThreads() and registerThread())

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rkennke rkennke force-pushed the rkennke/PROF-14350 branch from 93dd9df to 956e148 Compare May 19, 2026 13:10
@rkennke rkennke merged commit 93ab936 into main May 19, 2026
98 checks passed
@rkennke rkennke deleted the rkennke/PROF-14350 branch May 19, 2026 15:14
@github-actions github-actions Bot added this to the 1.43.0 milestone May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.