wip: Use TheadLocal to setup protection longjmp buffer#615
wip: Use TheadLocal to setup protection longjmp buffer#615zhengyu123 wants to merge 17 commits intomainDataDog/java-profiler:mainfrom zgu/objmonitor_deflationDataDog/java-profiler:zgu/objmonitor_deflationCopy head branch name to clipboard
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a HotSpot JDK 25.0.2+ crash involving MonitorDeflationThread by tightening crash-protection setup in the VM stack walker and hardening a TOCTOU-prone monitor dereference. It also adds regression tests in both ddprof-test (Java) and ddprof-lib (gtest) to cover the new behavior.
Changes:
- Gate
walkVM()crash-protection setup (ThreadShadow::_exception_file/VMThread::exception()) to only apply to “real” Java application threads viaVMThread::isJavaThread(). - Harden
VMKlass::fromOop()against concurrent monitor deflation by usingSafeAccess::safeFetch64()for the monitor pointer dereference. - Add Java + C++ regression/unit tests targeting the MonitorDeflationThread crash scenario and the new gating logic.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| ddprof-test/src/test/java/com/datadoghq/profiler/cpu/MonitorDeflationThreadSafetyTest.java | Adds a regression test intended to reproduce Monitor inflation/deflation while CPU profiling signals are active. |
| ddprof-lib/src/test/cpp/hotspot_crash_protection_ut.cpp | Adds gtest coverage for ProfiledThread classification and vtable-vote behavior used by VMThread::isJavaThread(). |
| ddprof-lib/src/main/cpp/hotspot/vmStructs.h | Uses safeFetch64 to avoid a crash when dereferencing an ObjectMonitor* that may be concurrently freed by deflation. |
| ddprof-lib/src/main/cpp/hotspot/hotspotSupport.cpp | Gates crash-protection activation to avoid writing _exception_file for JVM-internal threads; adds null-guard for vtable receiver klass lookup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
CI Test ResultsRun: #28384019189 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-29 15:55:18 UTC |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Reliability & Chaos Results✅ All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/121651273 |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…ofiler into zgu/objmonitor_deflation
| bool JVMThread::initialize() { | ||
| void* current_thread = currentThreadSlow(); | ||
| if (current_thread == nullptr) { | ||
| return false; | ||
| } | ||
| _jvm_thread.initialize(current_thread); | ||
| // _tid is side-effect of currentThreadSlow() | ||
| assert(_tid != nullptr); |
| if (vm_thread != NULL) { | ||
| _jmp_ctx.set(&crash_protection_ctx); | ||
| if (profiled_thread != nullptr) { | ||
| profiled_thread->setCrashProtectionActive(true); | ||
| } |
| bool protected_walk = (thrd != nullptr && thrd->isCrashProtectionActive()) | ||
| || sameStack(vm_thread->exception(), &vm_thread); | ||
| || sameStack(_jmp_ctx.get(), &vm_thread); | ||
| if (!protected_walk) { | ||
| return; | ||
| } |
| if (vm_thread != NULL) { | ||
| _jmp_ctx.set(&crash_protection_ctx); | ||
| if (profiled_thread != nullptr) { |
| void initialize(void* current_thread) { | ||
| // Called from known JavaThread, it should never be nullptr. | ||
| if (current_thread == nullptr) { | ||
| assert(false && "Should not reach here"); | ||
| } | ||
|
|
||
| long max_keys = sysconf(_SC_THREAD_KEYS_MAX); | ||
|
|
||
| for (long i = 0; i < max_keys; i++) { | ||
| if (pthread_getspecific((pthread_key_t)i) == current_thread) { | ||
| _key = pthread_key_t(i); | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| assert(isKeyValid() && "Invalid thread key"); | ||
| } |
| * Root cause: walkVM() unconditionally wrote a jmp_buf address into | ||
| * ThreadShadow::_exception_file for every non-null VMThread, including | ||
| * JVM-internal threads such as MonitorDeflationThread. In JDK 25, | ||
| * ObjectMonitorDeflationSafepointer reads thread state (including | ||
| * _exception_file) at safepoint boundaries; finding a stale jmp_buf | ||
| * address instead of NULL or a C-string caused a crash in | ||
| * deflate_monitor_list. | ||
| * | ||
| * Fix: gate the _exception_file write on VMThread::isJavaThread(), which | ||
| * uses vtable comparison to distinguish regular Java application threads | ||
| * from JVM-internal threads that are JavaThread subclasses in JDK 25+. |
| * Unit tests for the MonitorDeflationThread crash-protection gate. | ||
| * | ||
| * Root cause (JDK 25.0.2): walkVM() unconditionally wrote a jmp_buf address | ||
| * into ThreadShadow::_exception_file for every non-null VMThread, including | ||
| * JVM-internal threads like MonitorDeflationThread. In JDK 25, | ||
| * ObjectMonitorDeflationSafepointer reads _exception_file at safepoint | ||
| * boundaries; a stale jmp_buf address caused a crash in deflate_monitor_list. | ||
| * | ||
| * Fix: gate the write on `VMThread::isJavaThread(vm_thread)`. | ||
| * |
| // --------------------------------------------------------------------------- | ||
| // C. setup_crash_protection gate condition | ||
| // | ||
| // walkVM computes: setup_crash_protection = (vm_thread != NULL) && isJavaThread() | ||
| // Verify that the gate is false for TYPE_NOT_JAVA_THREAD threads (fast path). | ||
| // --------------------------------------------------------------------------- | ||
|
|
||
| class CrashProtectionGateTest : public ::testing::Test { | ||
| protected: | ||
| void SetUp() override { | ||
| ProfiledThread::initCurrentThread(); | ||
| _pt = ProfiledThread::currentSignalSafe(); | ||
| ASSERT_NE(nullptr, _pt); | ||
| } | ||
|
|
||
| void TearDown() override { | ||
| ProfiledThread::release(); | ||
| } | ||
|
|
||
| ProfiledThread* _pt = nullptr; | ||
|
|
||
| // Replicate the fast-path of isJavaThread() for gating purposes. | ||
| static bool fastPathIsJavaThread(ProfiledThread* pt) { | ||
| if (pt == nullptr) return false; | ||
| ProfiledThread::ThreadType type = pt->threadType(); | ||
| if (type != ProfiledThread::TYPE_UNKNOWN) { | ||
| return type == ProfiledThread::TYPE_JAVA_THREAD; | ||
| } | ||
| return false; // TYPE_UNKNOWN → slow vtable path (not tested here) | ||
| } | ||
| }; | ||
|
|
||
| // Gate is off for a JVM-internal thread (TYPE_NOT_JAVA_THREAD): | ||
| // _exception_file must NOT be written. | ||
| TEST_F(CrashProtectionGateTest, GateOffForNonJavaThread) { | ||
| _pt->setJavaThread(false); | ||
| // Simulate: vm_thread != NULL (non-null pointer means JVM thread struct exists) | ||
| bool vm_thread_non_null = true; | ||
| bool setup_crash_protection = vm_thread_non_null && fastPathIsJavaThread(_pt); | ||
| EXPECT_FALSE(setup_crash_protection); | ||
| } |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Scan-Build Report
Bug Summary
Reports
|
What does this PR do?:
This PR intents to resolve some of mysterious crashes seen in later releases, e.g. PROF-15206, although, it might not actually solve this particular problem, but I believe it should have positive impact - Cloude and I believe that overwriting JavaThread's exception field is dangerous, in fact, overwriting any JVM data structures are dangerous, because the writes can result in races and readers may see wrong values, that can lead to crashes.
Before introducing Thread Local Handshake in JVM, Thread object is usually only read by 'current` thread, so it is relative safe to alter, then restore values inside signal handler, but not any more - a thread may be able to read other thread's states, before it decides how to proceed the handshake, so it may read wrong values.
This PR introduces a thread local for storing longjmp buffer and eliminates the dependency on overwriting
JavaThread'sexceptionfield, solongjmpnow protects all threads, not justJavaThread- should improve stability.Motivation:
Improve profiler's stability.
Additional Notes:
In additional,
exceptionfield only exists inJavaThread, but detection code in Profiler is somewhat questionable, especially, for the subclasses ofJavaThreadHow to test the change?:
For Datadog employees:
credentials of any kind, I've requested a security review (run the
dd:platform-security-reviewskill, or file a request via the PSEC review form).
bewairealso runs automatically on every PR.Unsure? Have a question? Request a review!