Skip to content

GC infrastructure#11

Open
graalvmbot wants to merge 7 commits intojdk25from
chaeubl/GR-70066
Open

GC infrastructure#11
graalvmbot wants to merge 7 commits intojdk25from
chaeubl/GR-70066

Conversation

@graalvmbot
Copy link

Provides infrastructure that simplifies the integration of HotSpot garbage collectors (like Shenandoah) into Native Image. Each commit in this PR can be reviewed on its own.

@christianhaeubl
Copy link
Member

@simonis : this PR provides the C++ infrastructure that simplifies the integration of HotSpot garbage collectors (like Shenandoah) into Native Image. Note that the first commit ("Remove unnecessary files") can't be opened on GitHub (the diff is probably too large).

While I am working on the Java glue code, the interface between the C++ code and Native Image may continue to change a bit. So, I will probably force-push a few times to this branch.

@simonis
Copy link
Contributor

simonis commented Oct 17, 2025

Hi @christianhaeubl ,

I'm starting to look into this more closely and have some questions:

  1. In the README.md you write that the "mocks/: Contains autogenerated mocks for all deleted C/C++ headers". It currently also contains all the Shenandoah headers. Is my assumption right, that once I move start to move them over to share/gc/shenandoah/ I have to remove them from the mocks/ directory?
  2. Most of the header files under mocks/ are empty but some of them contain include directives. It seems that they are for include files which are present in either their canonical location or under svm/. Is that correct? Is that also true the other way round, i.e. every non-deleted header file is still referenced from header files under mock/ if it was originally referenced from those files?
  3. It looks like the files under svm/ still contain SVM #ifdef/#ifndefs. I thought that the svm/ directory only contains SVM-specific code anyway, so why are those ifdefs/ifndefs needed? Is this just a historical leftover from a time when it was tried to make those files easily mergeable with their original counterparts?
  4. Is my understanding correct, that for files which are kept under their original HotSpot path, we try to keep changes to a minimum and rather use #ifdef``#ifndef SVM? What's the rule of thumb for when to leave a file under it's original path and moving it to svm/? Just personal judgement?
  5. Is my assumption correct that everything that is supposed to end up in the native GC library should be in the svm_gc namespace? What sense does the following code make:
#ifndef SVM
namespace svm_gc {

Is this just a pointless artifact that we can ignore?

More to follow :)

@christianhaeubl
Copy link
Member

Regarding 1:
Yes, it is considered best practice to remove the corresponding mocks once you move the actual files into their location. This especially improves filename-based navigation in most IDEs. For the build, it shouldn't really matter because the mocks directory is last on the include path.

Regarding 2:
We keep the original include directives in the mocks to avoid unnecessary #ifdefs in the original files. Otherwise, we would for example see missing type errors if types are only included transitively. The includes in the mocks are resolved against either their standard location or the files under svm/. So, yes, this also means that non-deleted header files are referenced from mocks if the original file did the same.

Regarding 3:
Files under svm/ fall into two categories:

  • Files closely tracking OpenJDK code (e.g., nmethod.cpp): for such files, SVM needs to implement the same "interface" as the OpenJDK, so we sometimes use #ifdefs to make it clearer which parts are identical to the OpenJDK and which parts are SVM-specific. This primarily makes updating easier (i.e., diffing/merging with the OpenJDK sources).
  • Fully custom implementations (e.g., svmOopMap.hpp): in those files #ifdefs are unnecessary.

Regarding 4:
Yes, for files that remain in their original OpenJDK paths, we try to keep the modifications to a minimum and rather put all SVM-specific changes behind #ifdefs. This makes updating a lot easier as most of those #ifdefs will be merged without conflicts. Moving a file to svm/ (and subsequently simplifying the file to just the needed parts) is only used for situations where the diff to the OpenJDK is so large that complex merge conflicts would be pretty much guaranteed when updating.

Regarding 5:
Yes, pretty much all code that is supposed to end up in the native GC library should be in the svm_gc namespace to avoid naming conflicts. The pattern #ifndef SVM namespace svm_gc { should only appear in files that contain types whose layout needs to be visible on the Java-side of SVM (the image build generates C code to determine the offsets of fields and the sizes of types).

"Print extension of code cache") \
\
product(bool, ClassUnloading, true, \
product(bool, ClassUnloading, trueInSvm, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christianhaeubl, why have you set ClassUnloading to trueInSvm and ClassUnloadingWithConcurrentMark below to falseInSvm? I thought there is currently no runtime class loading/unloading in SubstrateVM, so shouldn't ClassUnloading be falseInSvm as well?

Copy link
Member

@christianhaeubl christianhaeubl Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially enabled that option because HotSpot’s code unloading logic is more aligned with what Native Image needs when class unloading is active. However, this doesn't seem to be relevant anymore (the code unloading logic is now a custom SVM-specific implementation), so we can and probably should set both ClassUnloading and ClassUnloadingWithConcurrentMark to falseInSvm.

@simonis
Copy link
Contributor

simonis commented Nov 19, 2025

Hi @christianhaeubl,

A quick update from my side. I can now run simple Java programs up to the point where they trigger a GC. You can find my version of the Shenandoah port at https://github.com/simonis/labs-openjdk/tree/simonis/GR-70066 (with an updated README.md file). It also requires an updated version of SubstrateVM from https://github.com/simonis/graal/tree/simonis/GR-70306.

I'll take a few days off now but I'll continue to work on the real GC part next week, when I return.

Best regards,
Volker

@simonis
Copy link
Contributor

simonis commented Jan 22, 2026

Hi @christianhaeubl.

I've made some progress in my branch and I can now run pretty far in -XX:ShenandoahGCMode=passive mode. But if I run with many concurrent application threads, I still see errors which are caused by heap corruption from time to time.

I think the errors are caused by concurrent allocations while a full GC is in progress. I more or less verified this in the debugger where I see states like the following:

(gdb) info threads
  Id   Target Id                                             Frame 
  1    Thread 0x7ffff7f5a740 (LWP 1797673) "AsciiMandel.exe" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  2    Thread 0x7ffff73ff640 (LWP 1797682) "Shenandoah GC T" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  3    Thread 0x7ffff597f640 (LWP 1797684) "Shenandoah Cont" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  4    Thread 0x7ffff587e640 (LWP 1797685) "VM Periodic Tas" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  5    Thread 0x7ffff577d640 (LWP 1797686) "OperationThread" svm_gc::ShenandoahMarkBitMap::map (this=svm_gc::ShenandoahMarkBitMap = {...}, word=78848)
    at /priv/simonisv/Git/Graal/labs-openjdk/src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp:77
  6    Thread 0x7fffe9fff640 (LWP 1797687) "ference Handler" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  7    Thread 0x7fffe97fe640 (LWP 1797688) "gnal Dispatcher" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  8    Thread 0x7fffe8ffd640 (LWP 1797696) "pool-3-thread-1" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  9    Thread 0x7fffdbfff640 (LWP 1797697) "pool-3-thread-2" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  10   Thread 0x7fffdb7fe640 (LWP 1797698) "pool-3-thread-3" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
* 11   Thread 0x7fffdaffd640 (LWP 1797699) "pool-3-thread-4" svm_gc::ShenandoahControlThread::handle_requested_gc (this=svm_gc::ShenandoahControlThread = {...}, 
    cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahControlThread.cpp:388
  12   Thread 0x7fffda7fc640 (LWP 1797700) "pool-3-thread-5" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  13   Thread 0x7fffd9ffb640 (LWP 1797701) "pool-3-thread-6" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  14   Thread 0x7fffd97fa640 (LWP 1797702) "pool-3-thread-7" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  15   Thread 0x7fffd8ff9640 (LWP 1797703) "pool-3-thread-8" 0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6

Thread 3 is the Shenandoah Control Thread which has triggered a Stop-The-World full GC cycle and waits for its completion:

[Switching to thread 3 (Thread 0x7ffff597f640 (LWP 1797684))]
#0  0x00007ffff7491117 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) where
#0  0x00007ffff7491117 in Unknown Frame at 0x7ffff597e850 () at /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7493a41 in pthread_cond_wait(libc.so.6:0) () at /lib/x86_64-linux-gnu/libc.so.6
#2  0x000055555567f473 in com.oracle.svm.core.posix.headers.Pthread::pthread_cond_wait_no_transition ()
#3  0x0000555555683365 in com.oracle.svm.core.posix.pthread.PthreadVMCondition::blockNoTransitionUnspecifiedOwner(PthreadVMLockSupport.java:197) in AsciiMandel.exe
    (this=com.oracle.svm.core.posix.pthread.PthreadVMCondition = {...}) at com/oracle/svm/core/posix/pthread/PthreadVMLockSupport.java:197
#4  0x000055555563f94e in com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport$NativeGCVMOperationWrapper::waitForStatusUpdate(NativeGCVMOperationSupport.java:278) in AsciiMandel.exe
    (data=NativeGCVMOperationWrapperData = {...}, minStatus=4) at com/oracle/svm/core/gc/shared/NativeGCVMOperationSupport.java:278
#5  0x000055555563fba5 in com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport::waitForVMOperationExecutionStatus(NativeGCVMOperationSupport.java:160) in AsciiMandel.exe
    (isolate=graal_isolate_t = {...}, isolateThread=graal_isolatethread_t = {...}, data=NativeGCVMOperationWrapperData = {...}, minStatus=4)
    at com/oracle/svm/core/gc/shared/NativeGCVMOperationSupport.java:160
#6  0x000055555563969e in com.oracle.svm.core.code.IsolateEnterStub::NativeGCVMOperationSupport_waitForVMOperationExecutionStatus_hPrDtPwN538Oo7WnvGXvX5(IsolateEnterStub.java:1) in AsciiMandel.exe (__long0=<optimized out>, __long1=<optimized out>, __long2=<optimized out>, __int3=<optimized out>) at com/oracle/svm/core/code/IsolateEnterStub.java:1
#7  0x00007ffff7d4aa5d in svm_gc::VMThread::execute(vmThread.cpp:139) in libshenandoahgc-debug-ur.so (op=svm_gc::VM_Operation = {...}) at svm/share/runtime/vmThread.cpp:139
#8  0x00007ffff7c65041 in svm_gc::ShenandoahFullGC::vmop_entry_full(shenandoahFullGC.cpp:93) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahFullGC = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahFullGC.cpp:93
#9  0x00007ffff7c64fbc in svm_gc::ShenandoahFullGC::collect(shenandoahFullGC.cpp:81) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahFullGC = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahFullGC.cpp:81
#10 0x00007ffff7cd125b in svm_gc::ShenandoahControlThread::service_stw_full_cycle(shenandoahControlThread.cpp:369) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahControlThread = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahControlThread.cpp:369
#11 0x00007ffff7cd0b79 in svm_gc::ShenandoahControlThread::run_service(shenandoahControlThread.cpp:165) in libshenandoahgc-debug-ur.so (this=svm_gc::ShenandoahControlThread = {...})
    at share/gc/shenandoah/shenandoahControlThread.cpp:165
#12 0x00007ffff7d162a8 in svm_gc::ConcurrentGCThread::run(concurrentGCThread.cpp:50) in libshenandoahgc-debug-ur.so (this=svm_gc::ConcurrentGCThread = {...})
    at share/gc/shared/concurrentGCThread.cpp:50
#13 0x00007ffff7c4a679 in svm_gc::Thread::call_run(thread.cpp:285) in libshenandoahgc-debug-ur.so (this=svm_gc::Thread = {...}) at share/runtime/thread.cpp:285
#14 0x00007ffff7c23908 in svm_gc::thread_native_entry(os_linux.cpp:901) in libshenandoahgc-debug-ur.so (thread=svm_gc::Thread = {...}) at os/linux/os_linux.cpp:901
#15 0x00007ffff7494ac3 in Unknown Frame at 0x7ffff597ee60 () at /lib/x86_64-linux-gnu/libc.so.6
#16 0x00007ffff75268c0 in Unknown Frame at 0x7ffff597ef00 () at /lib/x86_64-linux-gnu/libc.so.6

Thread 5 is Substrate's VM thread which executes the STW-GC as a VM operation:

(gdb) thread 5
[Switching to thread 5 (Thread 0x7ffff577d640 (LWP 1797686))]
#0  svm_gc::ShenandoahMarkBitMap::map (this=svm_gc::ShenandoahMarkBitMap = {...}, word=78848)
    at /priv/simonisv/Git/Graal/labs-openjdk/src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp:77
77	  bm_word_t map(idx_t word) const { return _map[word]; }
(gdb) where
#0  svm_gc::ShenandoahMarkBitMap::map(shenandoahMarkBitMap.hpp:77) in libshenandoahgc-debug-ur.so (this=svm_gc::ShenandoahMarkBitMap = {...}, word=78848)
    at /priv/simonisv/Git/Graal/labs-openjdk/src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.hpp:77
#1  0x00007ffff7cd332e in svm_gc::ShenandoahMarkBitMap::get_next_bit_impl<0ul, false>(shenandoahMarkBitMap.inline.hpp:161) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahMarkBitMap = {...}, l_index=4980736, r_index=5242880)
    at /priv/simonisv/Git/Graal/labs-openjdk/src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.inline.hpp:161
#2  0x00007ffff7cd2f97 in svm_gc::ShenandoahMarkBitMap::get_next_one_offset(shenandoahMarkBitMap.inline.hpp:177) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahMarkBitMap = {...}, l_offset=4980736, r_offset=5242880)
    at /priv/simonisv/Git/Graal/labs-openjdk/src/hotspot/share/gc/shenandoah/shenandoahMarkBitMap.inline.hpp:177
#3  0x00007ffff7cd238a in svm_gc::ShenandoahMarkBitMap::is_bitmap_clear_range(shenandoahMarkBitMap.cpp:57) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahMarkBitMap = {...}, start=svm_gc::HeapWordImpl = {...}, end=svm_gc::HeapWordImpl = {...}) at share/gc/shenandoah/shenandoahMarkBitMap.cpp:57
#4  0x00007ffff7cad01d in svm_gc::ShenandoahMarkingContext::is_bitmap_range_within_region_clear(shenandoahMarkingContext.cpp:69) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahMarkingContext = {...}, start=svm_gc::HeapWordImpl = {...}, end=svm_gc::HeapWordImpl = {...}) at share/gc/shenandoah/shenandoahMarkingContext.cpp:69
#5  0x00007ffff7cace56 in svm_gc::ShenandoahMarkingContext::is_bitmap_clear(shenandoahMarkingContext.cpp:48) in libshenandoahgc-debug-ur.so (this=svm_gc::ShenandoahMarkingContext = {...})
    at share/gc/shenandoah/shenandoahMarkingContext.cpp:48
#6  0x00007ffff7c65b8a in svm_gc::ShenandoahFullGC::phase1_mark_heap(shenandoahFullGC.cpp:303) in libshenandoahgc-debug-ur.so (this=svm_gc::ShenandoahFullGC = {...})
    at share/gc/shenandoah/shenandoahFullGC.cpp:303
#7  0x00007ffff7c6577f in svm_gc::ShenandoahFullGC::do_it(shenandoahFullGC.cpp:226) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahFullGC = {...}, gc_cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahFullGC.cpp:226
#8  0x00007ffff7c651a8 in svm_gc::ShenandoahFullGC::op_full(shenandoahFullGC.cpp:113) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahFullGC = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahFullGC.cpp:113
#9  0x00007ffff7c65124 in svm_gc::ShenandoahFullGC::entry_full(shenandoahFullGC.cpp:105) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahFullGC = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahFullGC.cpp:105
#10 0x00007ffff7cad902 in svm_gc::VM_ShenandoahFullGC::doit(shenandoahVMOperations.cpp:100) in libshenandoahgc-debug-ur.so (this=svm_gc::VM_ShenandoahFullGC = {...})
    at share/gc/shenandoah/shenandoahVMOperations.cpp:100
#11 0x00007ffff7d49673 in svm_gc::VM_Operation::evaluate(vmOperations.cpp:71) in libshenandoahgc-debug-ur.so (this=svm_gc::VM_Operation = {...}) at svm/share/runtime/vmOperations.cpp:71
#12 0x00007ffff7d4e4d4 in svm_gc::svm_gc_execute_vm_operation_main(svmToGC.cpp:390) in libshenandoahgc-debug-ur.so (data=svm_gc::VM_OperationData = {...}) at svm/svmToGC.cpp:390
#13 0x0000555555642d75 in com.oracle.svm.core.gc.shenandoah.nativelib.ShenandoahLibrary::executeVMOperationMain ()
#14 0x0000555555642585 in com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation::operate0(ShenandoahVMOperations.java:106) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation = {...}, data=NativeGCVMOperationData = {...})
    at com/oracle/svm/core/gc/shenandoah/ShenandoahVMOperations.java:106
#15 0x000055555563f605 in com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport$NativeGCVMOperation::operate(NativeGCVMOperationSupport.java:327) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation = {...}, data=NativeVMOperationData = {...})
    at com/oracle/svm/core/gc/shared/NativeGCVMOperationSupport.java:327
#16 0x000055555569d62a in com.oracle.svm.core.thread.VMOperation::execute(VMOperation.java:110) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation = {...}, data=NativeVMOperationData = {...}) at com/oracle/svm/core/thread/VMOperation.java:110
#17 0x000055555569e91e in com.oracle.svm.core.thread.VMOperationControl$WorkQueues::drain(VMOperationControl.java:577) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$WorkQueues = {...}, workQueue=com.oracle.svm.core.thread.VMOperationControl$NativeVMOperationQueue = {...})
    at com/oracle/svm/core/thread/VMOperationControl.java:577
#18 0x000055555569f130 in com.oracle.svm.core.thread.VMOperationControl$WorkQueues::executeAllQueuedVMOperations(VMOperationControl.java:547) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$WorkQueues = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:547
#19 0x000055555569ed37 in com.oracle.svm.core.thread.VMOperationControl$WorkQueues::enqueueAndExecute(VMOperationControl.java:484) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$WorkQueues = {...}, operation=com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation = {...}, data=NativeVMOperationData = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:484
#20 0x000055555569fa4d in com.oracle.svm.core.thread.VMOperationControl::enqueue(VMOperationControl.java:265) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl = {...}, operation=com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation = {...}, data=NativeVMOperationData = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:265
#21 0x000055555569f945 in com.oracle.svm.core.thread.VMOperationControl::enqueue(VMOperationControl.java:246) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl = {...}, data=NativeVMOperationData = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:246
#22 0x000055555569781d in com.oracle.svm.core.thread.NativeVMOperation::enqueue(NativeVMOperation.java:49) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shenandoah.ShenandoahVMOperations$ShenandoahVMOperation = {...}, data=NativeVMOperationData = {...})
    at com/oracle/svm/core/thread/NativeVMOperation.java:49
#23 0x000055555563f798 in com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport$NativeGCVMOperationWrapper::operate0(NativeGCVMOperationSupport.java:252) in AsciiMandel.exe
    (wrapperData=NativeGCVMOperationWrapperData = {...}) at com/oracle/svm/core/gc/shared/NativeGCVMOperationSupport.java:252
#24 0x000055555563f666 in com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport$NativeGCVMOperationWrapper::operate(NativeGCVMOperationSupport.java:230) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport$NativeGCVMOperationWrapper = {...}, data=NativeVMOperationData = {...})
    at com/oracle/svm/core/gc/shared/NativeGCVMOperationSupport.java:230
#25 0x000055555569d62a in com.oracle.svm.core.thread.VMOperation::execute(VMOperation.java:110) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shared.NativeGCVMOperationSupport$NativeGCVMOperationWrapper = {...}, data=NativeVMOperationData = {...})
    at com/oracle/svm/core/thread/VMOperation.java:110
#26 0x000055555569e91e in com.oracle.svm.core.thread.VMOperationControl$WorkQueues::drain(VMOperationControl.java:577) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$WorkQueues = {...}, workQueue=com.oracle.svm.core.thread.VMOperationControl$NativeVMOperationQueue = {...})
    at com/oracle/svm/core/thread/VMOperationControl.java:577
#27 0x000055555569f054 in com.oracle.svm.core.thread.VMOperationControl$WorkQueues::executeAllQueuedVMOperations(VMOperationControl.java:526) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$WorkQueues = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:526
#28 0x000055555569f8ad in com.oracle.svm.core.thread.VMOperationControl$WorkQueues::waitForWorkAndExecute(VMOperationControl.java:447) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$WorkQueues = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:447
#29 0x000055555569e478 in com.oracle.svm.core.thread.VMOperationControl$VMOperationThread::run(VMOperationControl.java:364) in AsciiMandel.exe
    (this=com.oracle.svm.core.thread.VMOperationControl$VMOperationThread = {...}) at com/oracle/svm/core/thread/VMOperationControl.java:364
#30 0x000055555570848b in <-- java.lang.Thread::runWith(Thread.java:1487) in AsciiMandel.exe
    (this=java.lang.Thread = {...}, bindings=java.lang.Class = {...}, op=com.oracle.svm.core.thread.VMOperationControl$VMOperationThread = {...}) at java/lang/Thread.java:1487
#31 java.lang.Thread::run(Thread.java:1474) in AsciiMandel.exe (this=java.lang.Thread = {...}) at java/lang/Thread.java:1474
#32 0x000055555569aa1e in com.oracle.svm.core.thread.PlatformThreads::threadStartRoutine(PlatformThreads.java:829) in AsciiMandel.exe (threadHandle=0x1)
    at com/oracle/svm/core/thread/PlatformThreads.java:829
#33 0x000055555569a8bb in com.oracle.svm.core.thread.PlatformThreads::threadStartRoutine(PlatformThreads.java:805) in AsciiMandel.exe (data=ThreadStartData = {...})
    at com/oracle/svm/core/thread/PlatformThreads.java:805
#34 0x000055555563975b in com.oracle.svm.core.code.IsolateEnterStub::PlatformThreads_threadStartRoutine_Z5jZ9wXZGDAvr0CL8KrTOA(IsolateEnterStub.java:1) in AsciiMandel.exe
    (__long0=<optimized out>) at com/oracle/svm/core/code/IsolateEnterStub.java:1
#35 0x00007ffff7494ac3 in Unknown Frame at 0x7ffff577ce60 () at /lib/x86_64-linux-gnu/libc.so.6
#36 0x00007ffff75268c0 in Unknown Frame at 0x7ffff577cf00 () at /lib/x86_64-linux-gnu/libc.so.6

For this it calls VM_ShenandoahFullGC::doit() through svm_gc_execute_vm_operation_main() which happens at a safepoint:

// TO_NATIVE - Only called from the VM thread at a safepoint.
EXPORT_FOR_SVM void svm_gc_execute_vm_operation_main(VM_OperationData *data) {
  assert(Thread::current()->is_VM_thread(), "must be the VM thread");
  assert(SafepointSynchronize::get_safepoint_state() == SafepointSynchronize::at_safepoint, "must be at a safepoint");
  assert(IsolateThread::current()->has_status_native(), "unexpected thread state");
  data->vm_operation()->evaluate();
}

There's even an assertion in svm_gc_execute_vm_operation_main() which checks that we're at a safepoint. But when looking at thread 11 (which is one of the application threads), we can see that it still running and allocating:

Thread 11 "pool-3-thread-4" hit Breakpoint 1, svm_gc::ShenandoahControlThread::handle_requested_gc (this=svm_gc::ShenandoahControlThread = {...}, 
    cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahControlThread.cpp:388
388	  if (should_terminate()) {
(gdb) where
#0  svm_gc::ShenandoahControlThread::handle_requested_gc(shenandoahControlThread.cpp:388) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahControlThread = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahControlThread.cpp:388
#1  0x00007ffff7cd1376 in svm_gc::ShenandoahControlThread::request_gc(shenandoahControlThread.cpp:383) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahControlThread = {...}, cause=svm_gc::GCCause::_allocation_failure) at share/gc/shenandoah/shenandoahControlThread.cpp:383
#2  0x00007ffff7cac782 in svm_gc::ShenandoahController::handle_alloc_failure(shenandoahController.cpp:60) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahController = {...}, req=..., block=true) at share/gc/shenandoah/shenandoahController.cpp:60
#3  0x00007ffff7cb7d9f in svm_gc::ShenandoahHeap::allocate_memory(shenandoahHeap.cpp:1055) in libshenandoahgc-debug-ur.so (this=svm_gc::ShenandoahHeap = {...}, req=...)
    at share/gc/shenandoah/shenandoahHeap.cpp:1055
#4  0x00007ffff7cb823a in svm_gc::ShenandoahHeap::mem_allocate(shenandoahHeap.cpp:1165) in libshenandoahgc-debug-ur.so
    (this=svm_gc::ShenandoahHeap = {...}, size=6, gc_overhead_limit_was_exceeded=0x7fffdaffc6e8) at share/gc/shenandoah/shenandoahHeap.cpp:1165
#5  0x00007ffff7d09118 in svm_gc::MemAllocator::mem_allocate_outside_tlab(memAllocator.cpp:255) in libshenandoahgc-debug-ur.so (this=svm_gc::MemAllocator = {...}, allocation=...)
    at share/gc/shared/memAllocator.cpp:255
#6  0x00007ffff7d094a4 in svm_gc::MemAllocator::mem_allocate(memAllocator.cpp:369) in libshenandoahgc-debug-ur.so (this=svm_gc::MemAllocator = {...}, allocation=...)
    at share/gc/shared/memAllocator.cpp:369
#7  0x00007ffff7d094ea in svm_gc::MemAllocator::allocate(memAllocator.cpp:376) in libshenandoahgc-debug-ur.so (this=svm_gc::MemAllocator = {...}) at share/gc/shared/memAllocator.cpp:376
#8  0x00007ffff7d0dd3c in svm_gc::CollectedHeap::array_allocate(collectedHeap.inline.hpp:44) in libshenandoahgc-debug-ur.so
    (this=svm_gc::CollectedHeap = {...}, klass=svm_gc::Klass = {...}, size=6, length=7, do_zero=true)
    at /priv/simonisv/Git/Graal/labs-openjdk/src/hotspot/share/gc/shared/collectedHeap.inline.hpp:44
#9  0x00007ffff7d4e8c5 in svm_gc::svm_gc_allocate_array(svmToGC.cpp:425) in libshenandoahgc-debug-ur.so (k=svm_gc::ArrayKlass = {...}, length=7) at svm/svmToGC.cpp:425
#10 0x00005555556427d5 in com.oracle.svm.core.gc.shenandoah.nativelib.ShenandoahLibrary::allocateArray ()
#11 0x00005555556426a8 in com.oracle.svm.core.gc.shenandoah.graal.ShenandoahAllocationSupport::allocateArray0(ShenandoahAllocationSupport.java:62) in AsciiMandel.exe
    (this=com.oracle.svm.core.gc.shenandoah.graal.ShenandoahAllocationSupport = {...}, length=7, hub=java.lang.Class = {...})
    at com/oracle/svm/core/gc/shenandoah/graal/ShenandoahAllocationSupport.java:62
#12 0x000055555563fd0e in com.oracle.svm.core.gc.shared.graal.NativeGCAllocationSupport::slowPathNewArray(NativeGCAllocationSupport.java:144) in AsciiMandel.exe
    (objectHeader=0x244e420, length=7) at com/oracle/svm/core/gc/shared/graal/NativeGCAllocationSupport.java:144
#13 0x000055555573fabc in java.math.MutableBigInteger::divideMagnitude(MutableBigInteger.java:1599) in AsciiMandel.exe
    (this=java.math.MutableBigInteger = {...}, div=java.math.MutableBigInteger = {...}, quotient=java.math.MutableBigInteger = {...}, needRemainder=true)
    at java/math/MutableBigInteger.java:1599
#14 0x000055555573e49f in java.math.MutableBigInteger::divideKnuth(MutableBigInteger.java:1319) in AsciiMandel.exe
    (this=java.math.MutableBigInteger = {...}, b=java.math.MutableBigInteger = {...}, quotient=java.math.MutableBigInteger = {...}, needRemainder=true)
    at java/math/MutableBigInteger.java:1319
#15 0x000055555573d7a5 in java.math.MutableBigInteger::divide(MutableBigInteger.java:1245) in AsciiMandel.exe
    (this=java.math.MutableBigInteger = {...}, b=java.math.MutableBigInteger = {...}, quotient=java.math.MutableBigInteger = {...}, needRemainder=true)
    at java/math/MutableBigInteger.java:1245
#16 0x0000555555728bf1 in <-- java.math.MutableBigInteger::divide(MutableBigInteger.java:1239) in AsciiMandel.exe
    (this=java.math.MutableBigInteger = {...}, b=java.math.MutableBigInteger = {...}, quotient=java.math.MutableBigInteger = {...}) at java/math/MutableBigInteger.java:1239
#17 java.math.BigDecimal::divideAndRound(BigDecimal.java:5026) in AsciiMandel.exe (bdividend=java.math.BigInteger = {...}, bdivisor=java.math.BigInteger = {...}, roundingMode=4)
    at java/math/BigDecimal.java:5026
#18 0x000055555572a260 in java.math.BigDecimal::divideAndRoundByTenPow(BigDecimal.java:4823) in AsciiMandel.exe (intVal=java.math.BigInteger = {...}, tenPow=49, roundingMode=4)
    at java/math/BigDecimal.java:4823
#19 0x000055555572b1f3 in java.math.BigDecimal::doRound(BigDecimal.java:4792) in AsciiMandel.exe (intVal=java.math.BigInteger = {...}, scale=49, mc=java.math.MathContext = {...})
    at java/math/BigDecimal.java:4792
#20 0x000055555572dda9 in java.math.BigDecimal::multiplyAndRound(BigDecimal.java:5866) in AsciiMandel.exe
    (x=java.math.BigInteger = {...}, y=java.math.BigInteger = {...}, scale=98, mc=java.math.MathContext = {...}) at java/math/BigDecimal.java:5866
#21 0x000055555572d960 in java.math.BigDecimal::multiply(BigDecimal.java:1641) in AsciiMandel.exe
    (this=java.math.BigDecimal = {...}, multiplicand=java.math.BigDecimal = {...}, mc=java.math.MathContext = {...}) at java/math/BigDecimal.java:1641
#22 0x00005555555f2953 in Complex::square(AsciiMandel.java:46) in AsciiMandel.exe (this=<optimized out>) at AsciiMandel.java:46
#23 0x00005555555f0672 in AsciiMandel::calculatePoint(AsciiMandel.java:80) in AsciiMandel.exe (__Object0=<optimized out>) at AsciiMandel.java:80
#24 0x00005555555f0ef6 in AsciiMandel::lambda$main$0(AsciiMandel.java:203) in AsciiMandel.exe
    (__int0=<optimized out>, __int1=<optimized out>, __int2=<optimized out>, __Object3=<optimized out>, __Object4=<optimized out>, __Object5=<optimized out>, __Object6=<optimized out>, __Object7=<optimized out>, __int8=<optimized out>, __Object9=<optimized out>, __Object10=<optimized out>) at AsciiMandel.java:203
#25 0x00005555555f037a in AsciiMandel$$Lambda/0x5bc022db535e68b7e72b4036c1d0f6ea0::run ()
#26 0x0000555555886dd7 in java.util.concurrent.ThreadPoolExecutor::runWorker(ThreadPoolExecutor.java:1090) in AsciiMandel.exe
    (this=java.util.concurrent.ThreadPoolExecutor = {...}, w=java.util.concurrent.ThreadPoolExecutor$Worker = {...}) at java/util/concurrent/ThreadPoolExecutor.java:1090
#27 0x00005555558848b4 in java.util.concurrent.ThreadPoolExecutor$Worker::run(ThreadPoolExecutor.java:614) in AsciiMandel.exe (this=java.util.concurrent.ThreadPoolExecutor$Worker = {...})
    at java/util/concurrent/ThreadPoolExecutor.java:614
#28 0x000055555570848b in <-- java.lang.Thread::runWith(Thread.java:1487) in AsciiMandel.exe
    (this=java.lang.Thread = {...}, bindings=java.lang.Class = {...}, op=java.util.concurrent.ThreadPoolExecutor$Worker = {...}) at java/lang/Thread.java:1487
#29 java.lang.Thread::run(Thread.java:1474) in AsciiMandel.exe (this=java.lang.Thread = {...}) at java/lang/Thread.java:1474
#30 0x000055555569aa1e in com.oracle.svm.core.thread.PlatformThreads::threadStartRoutine(PlatformThreads.java:829) in AsciiMandel.exe (threadHandle=0x7)
    at com/oracle/svm/core/thread/PlatformThreads.java:829
#31 0x000055555569a8bb in com.oracle.svm.core.thread.PlatformThreads::threadStartRoutine(PlatformThreads.java:805) in AsciiMandel.exe (data=ThreadStartData = {...})
    at com/oracle/svm/core/thread/PlatformThreads.java:805
#32 0x000055555563975b in com.oracle.svm.core.code.IsolateEnterStub::PlatformThreads_threadStartRoutine_Z5jZ9wXZGDAvr0CL8KrTOA(IsolateEnterStub.java:1) in AsciiMandel.exe
    (__long0=<optimized out>) at com/oracle/svm/core/code/IsolateEnterStub.java:1
#33 0x00007ffff7494ac3 in Unknown Frame at 0x7fffdaffce60 () at /lib/x86_64-linux-gnu/libc.so.6
#34 0x00007ffff75268c0 in Unknown Frame at 0x7fffdaffcf00 () at /lib/x86_64-linux-gnu/libc.so.6

I think this is becuase the application thread has called into the Shenandoah native code through svm_gc_allocate_array() which is implemented as follows:

// TO_VM - May be called by any Java thread. Uses oops. May block. May cause a safepoint.
EXPORT_FOR_SVM oop svm_gc_allocate_array(ArrayKlass *k, int length) {
  IsolateThread* thread = IsolateThread::current();
  assert(thread->has_status_vm(), "unexpected thread state");
  assert(k->is_array_klass(), "must be");
  assert(length >= 0, "must be");

  oop result = nullptr;
  if (length >= 0 && length <= k->max_length()) {
    int size = k->object_size(length);
    SVMGlobalData::_transition_vm_to_native(thread);
    result = Universe::heap()->array_allocate(k, size, length, true);
    SVMGlobalData::_slow_transition_native_to_vm(thread);
    assert(thread->has_status_vm(), "must be");
    if (result != nullptr) {
      BarrierSet::barrier_set()->on_slowpath_allocation_exit(JavaThread::current(), result);
    }
  }
  return result;
}

svm_gc_allocate_array() transitions from VM to native (with SVMGlobalData::_transition_vm_to_native()) before calling into the native Shenadoah code. I suppose that becuase the application thread is "in native", it is considered "at a safepoint" so it continues running while the VM thread is executing the STW-GC at a safepoint. But I think that leads to the problems I can observe.

How is this supposed to work? Don't you have similar problems with G1-GC and how have you solved them there?

@christianhaeubl
Copy link
Member

Once a thread does a transition to STATUS_IN_NATIVE, it can continue running even though the VM is at a safepoint (and therefore in the middle of executing a VM operation). So, yes, this is definitely a problem.

For G1, we use the following:

  • When svm_gc_allocate_array(...) is called, the Native Image Java code does a thread status transition from STATUS_IN_JAVA to STATUS_IN_VM. As far as I can see, the same happens for Shenandoah as well.
  • In svm_gc_allocate_array(...), the thread stays in STATUS_IN_VM as long as possible, i.e., we only do a transition to STATUS_IN_NATIVE if the thread needs to block while it is in C++ code. This can for example happen if it needs to acquire the Heap_lock and there is a contention. After blocking, we do a transition back to STATUS_IN_VM to ensure that no VM operations can execute while the thread is in C++ code.

If the blocking happens on a Mutex, the mutex infrastructure already takes care of the thread status transitions, see Mutex::lock_contended(...) in mutex.cpp.

@simonis
Copy link
Contributor

simonis commented Jan 22, 2026

Thanks for your comments. If I remove the immediate transition to STATUS_IN_NATIVE in svm_gc_allocate_array(...) I quickly run into this assertion in Monitor::wait(...):

bool Monitor::wait(uint64_t timeout) {
#ifdef SVM
  // NOTE (chaeubl): the current implementation only supports the case that the thread is already in native state. This simplifies the implementation so that it is very similar to wait_without_safepoint_check.
  assert(IsolateThread::current()->has_status_native_or_safepoint(), "otherwise, the logic would have to be more complex");

According to your comment in that code, it doesn't support to be called with STATUS_IN_VM because that would require a more complex logic. How would that logic look like?

@christianhaeubl
Copy link
Member

Seems that Shenandoah uses a few of the more complex pieces of the mutex infrastructure that G1 does not need. You would need to have a closer look at the OpenJDK implementation of Monitor::wait(...) and figure out how to map that logic to Native Image thread status transitions. In the worst case, more infrastructure might be necessary on the Native Image side.

I had a brief look at the code in OpenJDK and Monitor::wait(...) looks fairly similar to Mutex::lock_contended(...). So, I would assume that a lot of the SVM-specific logic of Mutex::lock_contended(...) can be reused for Monitor::wait(...), i.e.:

  • if the thread status is STATUS_IN_VM, do a transition to STATUS_IN_NATIVE
  • wait
  • after waiting, do a transition back to STATUS_IN_VM

@simonis
Copy link
Contributor

simonis commented Jan 22, 2026

Thanks a lot, that seems to help. After implementing Monitor::wait(...) in the way you suggested I could remove the assertion and the immediate transition to native in svm_gc_allocate_array(..)/svm_gc_allocate_instance(..). Afterwards, I quickly ran into deadlocks, which I could then easily fix by transitioning to native in ShenandoahLock::contended_lock_internal(..).

With these changes I couldn't observe any problems until now when stressing the GC.

@simonis
Copy link
Contributor

simonis commented Jan 30, 2026

Hello Christian,

I've started to work on the concurrent Shenandoah mode for Native Image and I think I arrived at a point where we need additional support in SubstrateVM and/or the GC interface.

Shenandoah is using concurrent thread processing since JDK 17 (as introduced by ZGC in JDK 16 with JEP 376: ZGC: Concurrent Thread-Stack Processing), a feature not supported by G1 GC. This requires stack watermark barriers (as described in JEP 376) which themselves require runtime support in the VM and the compiler. From my understanding, SubstrateVM currently doesn't support stack watermark barriers and this feature would have to be implemented first and then exported through the GC interface.

Also, in order to process the thread stacks concurrently, Shenandoah uses the ThreadsListHandle class which itself is based on the Thread Safe Memory Reclamation (i.e. Thread SMR) support which was added by 8167108: inconsistent handling of SR_lock can lead to crashes. While I don't think that we necessarily need Thread SMR support in SubstrateVM, I think we need a way to concurrently process Java threads. From my understanding, the current way of iterating over Java threads with [JavaThreadIteratorWithHandle]https://github.com/simonis/labs-openjdk/blob/2bbcd333dc5b68e98c8edbf3bcc5c15f07136fc0/src/hotspot/svm/share/runtime/threadSMR.hpp#L51) only works at a safepoint.

For the compiler support, I'm pretty sure that the Graal JIT compiler supports stack watermark barriers because it added support for ZGC in GraalVM 23.0 with [GR-27475] Add ZGC support. I'm however unsure if this also applies to the Graal compiler in AOT mode as used for creating native images? This also reveals another question regarding the compiler support for Shenandoah. As you might recall, we added support for Shenandoah to the Graal JIT with Shenandoah support but again I'm not sure if this also works in AOT mode or if this requires additional changes? For the Shenandoah compiler support I think the question is mostly about emitting the correct barriers when compiling a native image for Shenandoah (notice that passive mode doesn't require barriers, that's probably why it currently works in passive mode independently of the compiler support).

Kindly awaiting your thoughts and recommendations,
Volker

@christianhaeubl
Copy link
Member

stack watermark barriers

At the moment, this is something that is not supported in Native Image. Is it correct that OpenJDK uses the safepoint mechanism for that? I think that won't be possible for Native Image because this also needs to work for code without safepoints (i.e., uninterruptible code doesn't contain any safepoints but may have references to objects in the collected Java heap).

Besides that, I guess that all the stack walking logic (including the exception handling) needs to take that into account as well? Let's assume that we are in a VM operation and walking the stack of another thread. Which code would need to be executed so that the stack walking can continue past the watermark?

Maybe you can point me to some of the OpenJDK code. Then, I can try to figure out what is needed in Native Image.

Thread SMR

SMR is also not supported at the moment. Outside of a safepoint, it is currently only possible to iterate over all threads if you acquire VMThreads#THREAD_MUTEX (see JavaDoc of VMThreads#THREAD_MUTEX for a bit more information). This mutex is not yet exposed to C++ code and it can be tricky to use as it is also used by low-level code such as the VM operation infrastructure. While holding the mutex, a thread can't do any status transitions to STATUS_IN_JAVA or STATUS_IN_VM.

So, depending on what you need to do exactly, larger refactorings may be needed.

emitting the correct barriers

This will require some work but it should be fairly straight forwarded because you can reuse a lot of logic from the JIT barriers:

  • It is usually some effort to abstract the barrier code so that all the VM-specific details are in VM-specific classes (for example, all calls to HotSpotReplacementsUtil would need to be in HotSpot-specific classes).
  • After that, you will encounter some code patterns or special cases that you didn't see on HotSpot (e.g., Word types). Differences like that typically manifest in build-time errors that can be debugged nicely.

@christianhaeubl
Copy link
Member

I also briefly tried your current version locally and ran a few of our internal tests. A heavily multi-threaded application that additionally does a lot of object pinning finishes execution successfully, which is great.

For various other workloads, I am seeing the following assertion failure at run-time though with a fastdebug build. In case that you didn't see that failure on your workloads yet, please let me know if should do a run that prints additional diagnostics.
assert(l2esz <= LogBytesPerLong) failed: sanity. l2esz: 0x40 for lh: 0xdb89b040

I also had a brief look at the commits and the necessary SVM-specific changes in the C++ code:

  • So far, the changes are similar to what we needed for G1 as well.
  • Just a minor recommendation: when restoring the unmodified version of a deleted file, I would suggest to use a separate commit. At least according to my experience, this can make life easier if you need to update the code base to a new HotSpot version later on.

@simonis
Copy link
Contributor

simonis commented Mar 5, 2026

Hi @christianhaeubl,

Thanks a lot for providing reproducers for the problems you've mentioned above and on Slack. I'll attach your tests and test instructions here for better visibility.

Here is a list of the problems that I saw while executing a subset of the tests locally with Shenandoah:

  • Various crashes when arrays are copied from the image heap to the collected Java heap:
    segfault
    assert(l2esz <= LogBytesPerLong) failed
    assert(lh < Klass::_lh_neutral_value) failed

  • A deadlock when a GC is triggered in the VM operation thread.

  • The reference handler thread doesn't seem to get notified. This test assumes that the reference handling should be triggered after a full GC, not sure if that is true for Shenandoah.

I created reproducers for all of them, the diff is in the snippet below. You can apply it to the graal repository and then execute:

cd graal/substratevm
mx build

export NATIVE_IMAGE_OPTIONS="-H:-UsePerfData --initialize-at-build-time=com.oracle.svm.test.reproducer.ArrayCopySegfault --initialize-at-build-time=com.oracle.svm.test.reproducer.SmallObject -H:+UseShenandoahGC --native-compiler-options=-Wl,--unresolved-symbols=ignore-all -R:ShenandoahGCMode=passive -H:ShenandoahDebugLevel=debug"

mx native-unittest -p ArrayCopySegfault
mx native-unittest -p Deadlock
mx native-unittest -p ReferenceHandling

reproducers.patch

@simonis
Copy link
Contributor

simonis commented Mar 5, 2026

I fixed the first problem (i.e. ArrayCopySegfault) which was caused by not handling humongous regions in the open image heap correctly. I've pushed the fix to my labs-jdk Shenandoah branch.

@simonis
Copy link
Contributor

simonis commented Mar 9, 2026

Hi @christianhaeubl,

The problem with the deadlock in yout Deadlock test seems a little bit more complicated. I assume this test works with G1 GC and I assume that's because calling System.gc() with G1 will immediately invoke a VM Operation which does a full, STW collection. Does your Deadlock test actually works with G1 and -XX:+ExplicitGCInvokesConcurrent? I wanted to try it out myself, but I couldn't figure out how to run these tests with an existing GraalVM without invoking a full build first.

For Shenandoah, the following happens:

 Application thread:
-------------------

TriggerGCOperation::enqueue()
  ->JavaVMOperation::enqueue()
    -> VMOperationControl::enqueue()
      -> VMOperationControl.WorkQueues::enqueueAndWait()

 - locks VMOperationThread::mutex
VM OperationThread:
-------------------

VMOperationThread::run()
  -> WorkQueues::waitForWorkAndExecute()
    -> WorkQueues::executeAllQueuedVMOperations()
      -> WorkQueues::drain()
        -> VMOperation::execute()
          -> JavaVMOperation::operate()
            -> Deadlock$TriggerGCOperation::operate()
              -> Runtime::gc() -> ShenandoahLibrary::collect()
                -> svm_gc::svm_gc_collect()
                  -> ShenandoahHeap::collect()
                    -> ShenandoahControlThread::request_gc()
                      -> ShenandoahControlThread::handle_requested_gc()
                        -> MonitorLocker::wait()
                          -> Monitor::wait()

 - waits on ShenandoahController::_gc_waiters_lock (with Mutex::_safepoint_check_flag)
Shenandoah ControlThread:
-------------------------

ShenandoahControlThread::run_service()
  -> ShenandoahControlThread::service_stw_full_cycle()
    -> ShenandoahFullGC::collect()
      -> ShenandoahFullGC::vmop_entry_full()
        -> VMThread::execute()
          -> ShenandoahVMOperations::collectFull()
            -> NativeGCVMOperationSupport::enqueue()
              -> NativeVMOperation::enqueueFromNonJavaThread()
                -> VMOperationControl::enqueueFromNonJavaThread()
                  -> WorkQueues::enqueueUninterruptibly()

- also tries to lock VMOperationThread::mutex (lockNoTransitionUnspecifiedOwner())

So the application thread creates a VMOperation, enqueues it and blocks on the VMOperationThread::mutex. TheSubstrateVM Operation Thread picks up the VMOperation and executes it by finally calling ShenandoahControlThread::handle_requested_gc() in the Shenandoah shared library. handle_requested_gc() sets the _gc_requested flag for the Shenandoah Control Thread and waits on the ShenandoahController::_gc_waiters_lock until the GC completes.

The Shenandoah Control Thread gets notified by the change of the _gc_requested flag and triggers a full GC by creating a corresponding VMOperation and enqueuing it for the SubstrateVM Operation Thread. But at this point we deadlock, because enqueuing the new VMOperation requires locking of the VMOperationThread::mutex which is already locked by the Application thread who enqueued the initial VMOperation.

I now have the following questions:

  1. Does this test reflect an actual, required use case, or is it just something that happened to work with G1 GC but isn't strictly required for Shenandoah?
  2. In the case this functionality is really required, do you have any idea how it could be fixed?
  3. How can I run the unittests with an existing Graal JDK (i.e. Oracle GraalVM)? OR asking the other way round, how can I easily compile these tests without mx. I've tried to pass the -v flag to mx to get the native-image command line, but that is several screen pages long and in addition contains several, dynamically generated, argument files.

@christianhaeubl
Copy link
Member

christianhaeubl commented Mar 10, 2026

Hi @simonis, I did a brief test with G1 and -XX:+ExplicitGCInvokesConcurrent, and it seems to work. It is possible that the thread interactions are a bit simpler with G1.

Some VM operations in Native Image allocate Java heap memory. So, all GCs must at least support implicitly triggered collections during VM operations.
Regarding System.gc(): besides tests, I don't think that we have any VM operations that explicitly call System.gc() at the moment.

For Shenandoah, something like the following might work:

Before the Shenandoah control thread enqueues a VM operation, it needs to check if the VM operation thread is blocked in ShenandoahControlThread::handle_requested_gc():

  • The control thread sees that the VM operation thread is blocked in ShenandoahControlThread::handle_requested_gc().
  • The control thread directly notifies the VM operation thread that there is a VM operation to execute, wakes up the VM operation thread, and blocks until the VM operation completes.
  • The VM operation thread wakes up in ShenandoahControlThread::handle_requested_gc(), detects the pending VM operation, and executes it directly (the infrastructure already supports nested VM operations, so that should not be a problem).
  • Once done, the VM operation thread returns to ShenandoahControlThread::handle_requested_gc() and blocks until the control thread is done with its work.
  • The control thread notices that the VM operation finished. So, it can continue executing whatever code it needs execute. Once done, it needs to wait until the VM operation thread is blocked again in ShenandoahControlThread::handle_requested_gc() before it may notify the VM operation thread.

I think this can be implemented on the C++ side, without needing any Java changes. It might be possible to simplify that approach a bit, depending on the exact code that the Shenandoah control thread needs to execute.

But at this point we deadlock, because enqueuing the new VMOperation requires locking of the VMOperationThread::mutex which is already locked by the Application thread who enqueued the initial VMOperation.

I think that part isn't correct because the application thread releases the mutex before it blocks (see operationFinished.block()). Instead, the VM operation thread holds the mutex at the time of the deadlock.

How can I run the unittests with an existing Graal JDK (i.e. Oracle GraalVM)? OR asking the other way round, how can I easily compile these tests without mx. I've tried to pass the -v flag to mx to get the native-image command line, but that is several screen pages long and in addition contains several, dynamically generated, argument files.

I don't think that we support that at the moment. You can probably compile the tests by adding the SVM-internal code (e.g., the VM operation infrastrcuture) to the classpath but that is also nothing that I ever tried.

@simonis
Copy link
Contributor

simonis commented Mar 10, 2026

So here's how you can build and run these tests "standalone" (mainly for my personal use):

  1. Convert from JUnit to a plain Java class with a main method, e.g.:
import com.oracle.svm.core.heap.VMOperationInfos;
import com.oracle.svm.core.thread.JavaVMOperation;

public class GraalShenandoahDeadlock {

    public static void main(String[] args) {
        TriggerGCOperation triggerGcOp = new TriggerGCOperation();
        triggerGcOp.enqueue();
    }

    private static class TriggerGCOperation extends JavaVMOperation {
        protected TriggerGCOperation() {
            /* Please also test with SystemEffect.SAFEPOINT. */
            super(VMOperationInfos.get(TriggerGCOperation.class, "Trigger GC outside safepoint", SystemEffect.NONE));
        }

        @Override
        protected void operate() {
            System.gc();
        }
    }
}
  1. Compile to class-file with svm.jar on the classpath:
$ javac -cp svm.jar GraalShenandoahDeadlock.java
  1. Compile to a native image (exporting org.graalvm.nativeimage.builder/com.oracle.svm.core.thread and org.graalvm.nativeimage.builder/com.oracle.svm.core.heap):
$ native-image -o GraalShenandoahDeadlock.exe \
     --add-exports=org.graalvm.nativeimage.builder/com.oracle.svm.core.thread=ALL-UNNAMED \
     --add-exports=org.graalvm.nativeimage.builder/com.oracle.svm.core.heap=ALL-UNNAMED \
     GraalShenandoahDeadlock

It is crucial to export org.graalvm.nativeimage.builder/com.oracle.svm.core.thread and org.graalvm.nativeimage.builder/com.oracle.svm.core.heap, otherwise the build will succeed, but you will get a strange run-time error:

Exception in thread "main" java.lang.NoClassDefFoundError: GraalShenandoahDeadlock$TriggerGCOperation
	at GraalShenandoahDeadlock.main(GraalShenandoahDeadlock.java:10)
	at java.base@25-internal/java.lang.invoke.LambdaForm$DMH/sa346b79c.invokeStaticInit(LambdaForm$DMH)

This can't be fixed even by forcing GraalShenandoahDeadlock$TriggerGCOperation to be build-time initialized (i.e. the build still succeeds but the resulting native image will still fail with the same NoClassDefFoundError). I think this is more or less an error of native image, which should already report this error at build time rather than at run time.

@olpaw
Copy link
Member

olpaw commented Mar 11, 2026

It is crucial to export org.graalvm.nativeimage.builder/com.oracle.svm.core.thread and org.graalvm.nativeimage.builder/com.oracle.svm.core.heap, otherwise the build will succeed, but you will get a strange run-time error:

@simonis I would expect you to get a build-time error message if you build with the --link-at-build-time option. This way you should see immediately when you are missing the right --add-exports.

@simonis
Copy link
Contributor

simonis commented Mar 11, 2026

@simonis I would expect you to get a build-time error message if you build with the --link-at-build-time option. This way you should see immediately when you are missing the right --add-exports.

Thanks @olpaw , you are right, when using --link-at-build-time, I indeed get a build time error. I was just a little confused, because the original documentation of --link-at-build-time reads "by default, a build no longer fails if a class cannot be found on the classpath or module path. In some cases, this is desirable because an application may define different behavior if certain classes are not available", but in my case this was a visibility issue rather than com.oracle.svm.core.thread classes not being on the classpath.

Also, just out of interest, why doesn't --initialize-at-build-time='com.oracle.svm.test.reproducer.GraalShenandoahDeadlock$TriggerGCOperation' help here? I would have expected that class initialization at build time is a stronger requirement than class linking? But the error during class initialization at build time seems to be silently ignored (although I think that class initialization mandates prior linking).

@olpaw
Copy link
Member

olpaw commented Mar 12, 2026

Also, just out of interest, why doesn't --initialize-at-build-time='com.oracle.svm.test.reproducer.GraalShenandoahDeadlock$TriggerGCOperation' help here? I would have expected that class initialization at build time is a stronger requirement than class linking? But the error during class initialization at build time seems to be silently ignored (although I think that class initialization mandates prior linking).

It sure does! But without --link-at-build-time you only get to see the result of the failing build-time initialization when you first access GraalShenandoahDeadlock$TriggerGCOperation in your code at image run-time.

@simonis
Copy link
Contributor

simonis commented Mar 13, 2026

@olpaw, I don't want to be picky, and maybe I'm not fully understanding this, but what you say is that with --initialize-at-build-time='com.oracle.svm.test.reproducer.GraalShenandoahDeadlock$TriggerGCOperation' build time initialization of GraalShenandoahDeadlock$TriggerGCOperation is tried and fails but I only see the results of this failure at run time.

But that's not what I'm observing. If what you say would be the case, then I'd expect to see the actual root cause of the failure (i.e. java.lang.IllegalAccessError: superclass access check failed: class GraalShenandoahDeadlock$TriggerGCOperation (in unnamed module @0x4ce2ba51) cannot access class com.oracle.svm.core.thread.JavaVMOperation (in module org.graalvm.nativeimage.builder)) at runtime. But that's not the case, because GraalShenandoahDeadlock$TriggerGCOperation isn't in the native image, so I get a NoClassDefFoundError at runtime.

Once again, I think that --initialize-at-build-time='com.oracle.svm.test.reproducer.GraalShenandoahDeadlock$TriggerGCOperation' should necessarily imply and trigger class linking of GraalShenandoahDeadlock$TriggerGCOperation (because according to the JVMLS §5.5 a class needs to be linked before it can be initialized) and that should fail at build time. Instead, --initialize-at-build-time seems to be just a hint to native image which can (and is) happily ignored. I can actually pass whatever non-existing class as an argument to --initialize-at-build-time without seeing any error.

@simonis
Copy link
Contributor

simonis commented Mar 13, 2026

@christianhaeubl, I have now fixed the reference handling issue as well (see this commit in my labs-jdk Shenandoah branch).

I'm still struggling with the deadlock due to nested VM operations. Will try again now.

@olpaw
Copy link
Member

olpaw commented Mar 17, 2026

Once again, I think that --initialize-at-build-time='com.oracle.svm.test.reproducer.GraalShenandoahDeadlock$TriggerGCOperation' should necessarily imply and trigger class linking of GraalShenandoahDeadlock$TriggerGCOperation (because according to the JVMLS §5.5 a class needs to be linked before it can be initialized) and that should fail at build time. Instead, --initialize-at-build-time seems to be just a hint to native image which can (and is) happily ignored. I can actually pass whatever non-existing class as an argument to --initialize-at-build-time without seeing any error.

If a class is seen as reachable by the static analysis com.oracle.svm.hosted.classinitialization.ClassInitializationSupport#ensureClassInitialized happens. Here, if class-init fails, InitKind.RUN_TIME is returned, directing NI to treat this class as initialized as runtime, thus failing at runtime on first access (but only if LinkAtBuildTimeSupport.singleton().linkAtBuildTime(clazz) is false).

For that to happen the image builder calls com.oracle.svm.hosted.analysis.DynamicHubInitializer#buildRuntimeInitializationInfo at a later stage. There the actual synthesizing of the error happens so that it can be replayed at runtime (see catch (VerifyError e) { ...}).

If this does not work as described above feel free to create a ticket. cc @cstancu @vjovanov

@simonis
Copy link
Contributor

simonis commented Mar 24, 2026

Hi @christianhaeubl,

I agonized over this problem, but I couldn't find a solution. Especially, I can't understand what you mean by:

  • The control thread directly notifies the VM operation thread that there is a VM operation to execute, wakes up the VM operation thread, and blocks until the VM operation completes.
  • The VM operation thread wakes up in ShenandoahControlThread::handle_requested_gc(), detects the pending VM operation, and executes it directly (the infrastructure already supports nested VM operations, so that should not be a problem).
  • Once done, the VM operation thread returns to ShenandoahControlThread::handle_requested_gc() and blocks until the control thread is done with its work.

How can the Shenandoah ControlThread notify the VM thread which is blocked in Shenandoah code? The only way to do that is to notify the _gc_waiters_lock on which the VM thread is blocked (in handle_requested_gc()). But if I do that, how does the VM Operation thread, which wakes up in handle_requested_gc(), can detect that there's a pending VM operation? He is in Shenandoah code and the Shenandoah code will just realize the no GC has happened since it blocked and will just block on the _gc_waiters_lock again:

  MonitorLocker ml(&_gc_waiters_lock);
  size_t current_gc_id = get_gc_id();
  size_t required_gc_id = current_gc_id + 1;
  while (current_gc_id < required_gc_id && !should_terminate()) {
    _requested_gc_cause = cause;
    _gc_requested.set();
    ml.wait();
    current_gc_id = get_gc_id();
  }

I.e. how can the VM operations thread detect and execute a new, pending VM operation while already blocked in VM operation in native Shenandoah code (before finishing it)?

Moreover, as far as I can see, native VM operations from Shenandoah are enqueued from VMOperationControl::enqueueFromNonJavaThread() which uses the mainQueues:

    public void enqueueFromNonJavaThread(NativeVMOperation operation, NativeVMOperationData data) {
        mainQueues.enqueueUninterruptibly(operation, data);
    }

whereas support for recursive VM operations is only implemented in VMOperationControl::enqueue():

    private void enqueue(VMOperation operation, NativeVMOperationData data) {
        StackOverflowCheck.singleton().makeYellowZoneAvailable();
        try {
            if (mayExecuteVmOperations()) {
                // a recursive VM operation (either triggered implicitly or explicitly) -> execute
                // it right away
                immediateQueues.enqueueAndExecute(operation, data);
            } else if (useDedicatedVMOperationThread()) {
...

I'm completely lost...

@christianhaeubl
Copy link
Member

I hope I am not missing anything, but my main point is that you need to directly communicate some information between the VM operation thread and the Shenandoah control thread. Below are more details on what I suggested above, this time a bit closer to an actual implementation. Most of what I am describing is new logic that you will need to add (C++ code only).

  • Right before the VM operation thread blocks in ShenandoahControlThread::handle_requested_gc(), it needs to store information about that in a new field (e.g., _vm_thread_blocked). Then, it blocks on _gc_waiters_lock as usual.
  • The Shenandoah control thread checks _vm_thread_blocked. Based on the information there, it can tell that the VM operation thread is currently blocked inside ShenandoahControlThread::handle_requested_gc(). Therefore, it must execute the following logic:
    • Instead of enqueuing the VM operation, it only stores information about the VM operation in a new field (e.g., _recursiveVmOp).
    • Then, it notifies the VM operation thread.
  • The VM operation thread wakes up in ShenandoahControlThread::handle_requested_gc(), checks _recursiveVmOp, and realizes that a recursive VM operation is pending. So, it needs to:
    • Execute the VM operation directly via VMThread::execute(...). This method already takes into account that it might be called by the VM operation thread (i.e., it already supports recursive VM operations).
    • After executing the VM operation, the thread is back in ShenandoahControlThread::handle_requested_gc() and blocks until the control thread finishes its work.
  • The control thread notices that the VM operation finished. So, it can continue executing whatever code it needs execute. Once done, it needs to wait until the VM operation thread is blocked again in ShenandoahControlThread::handle_requested_gc() before it may notify the VM operation thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants