Conversation
|
@simonis : this PR provides the C++ infrastructure that simplifies the integration of HotSpot garbage collectors (like Shenandoah) into Native Image. Note that the first commit ("Remove unnecessary files") can't be opened on GitHub (the diff is probably too large). While I am working on the Java glue code, the interface between the C++ code and Native Image may continue to change a bit. So, I will probably force-push a few times to this branch. |
1418f0a to
2bbcd33
Compare
|
Hi @christianhaeubl , I'm starting to look into this more closely and have some questions:
Is this just a pointless artifact that we can ignore? More to follow :) |
|
Regarding 1: Regarding 2: Regarding 3:
Regarding 4: Regarding 5: |
| "Print extension of code cache") \ | ||
| \ | ||
| product(bool, ClassUnloading, true, \ | ||
| product(bool, ClassUnloading, trueInSvm, \ |
There was a problem hiding this comment.
@christianhaeubl, why have you set ClassUnloading to trueInSvm and ClassUnloadingWithConcurrentMark below to falseInSvm? I thought there is currently no runtime class loading/unloading in SubstrateVM, so shouldn't ClassUnloading be falseInSvm as well?
There was a problem hiding this comment.
I initially enabled that option because HotSpot’s code unloading logic is more aligned with what Native Image needs when class unloading is active. However, this doesn't seem to be relevant anymore (the code unloading logic is now a custom SVM-specific implementation), so we can and probably should set both ClassUnloading and ClassUnloadingWithConcurrentMark to falseInSvm.
|
Hi @christianhaeubl, A quick update from my side. I can now run simple Java programs up to the point where they trigger a GC. You can find my version of the Shenandoah port at https://github.com/simonis/labs-openjdk/tree/simonis/GR-70066 (with an updated README.md file). It also requires an updated version of SubstrateVM from https://github.com/simonis/graal/tree/simonis/GR-70306. I'll take a few days off now but I'll continue to work on the real GC part next week, when I return. Best regards, |
|
Hi @christianhaeubl. I've made some progress in my branch and I can now run pretty far in I think the errors are caused by concurrent allocations while a full GC is in progress. I more or less verified this in the debugger where I see states like the following: Thread 3 is the Shenandoah Control Thread which has triggered a Stop-The-World full GC cycle and waits for its completion: Thread 5 is Substrate's VM thread which executes the STW-GC as a VM operation: For this it calls // TO_NATIVE - Only called from the VM thread at a safepoint.
EXPORT_FOR_SVM void svm_gc_execute_vm_operation_main(VM_OperationData *data) {
assert(Thread::current()->is_VM_thread(), "must be the VM thread");
assert(SafepointSynchronize::get_safepoint_state() == SafepointSynchronize::at_safepoint, "must be at a safepoint");
assert(IsolateThread::current()->has_status_native(), "unexpected thread state");
data->vm_operation()->evaluate();
}There's even an assertion in I think this is becuase the application thread has called into the Shenandoah native code through // TO_VM - May be called by any Java thread. Uses oops. May block. May cause a safepoint.
EXPORT_FOR_SVM oop svm_gc_allocate_array(ArrayKlass *k, int length) {
IsolateThread* thread = IsolateThread::current();
assert(thread->has_status_vm(), "unexpected thread state");
assert(k->is_array_klass(), "must be");
assert(length >= 0, "must be");
oop result = nullptr;
if (length >= 0 && length <= k->max_length()) {
int size = k->object_size(length);
SVMGlobalData::_transition_vm_to_native(thread);
result = Universe::heap()->array_allocate(k, size, length, true);
SVMGlobalData::_slow_transition_native_to_vm(thread);
assert(thread->has_status_vm(), "must be");
if (result != nullptr) {
BarrierSet::barrier_set()->on_slowpath_allocation_exit(JavaThread::current(), result);
}
}
return result;
}
How is this supposed to work? Don't you have similar problems with G1-GC and how have you solved them there? |
|
Once a thread does a transition to For G1, we use the following:
If the blocking happens on a |
|
Thanks for your comments. If I remove the immediate transition to bool Monitor::wait(uint64_t timeout) {
#ifdef SVM
// NOTE (chaeubl): the current implementation only supports the case that the thread is already in native state. This simplifies the implementation so that it is very similar to wait_without_safepoint_check.
assert(IsolateThread::current()->has_status_native_or_safepoint(), "otherwise, the logic would have to be more complex");According to your comment in that code, it doesn't support to be called with |
|
Seems that Shenandoah uses a few of the more complex pieces of the mutex infrastructure that G1 does not need. You would need to have a closer look at the OpenJDK implementation of I had a brief look at the code in OpenJDK and
|
|
Thanks a lot, that seems to help. After implementing With these changes I couldn't observe any problems until now when stressing the GC. |
|
Hello Christian, I've started to work on the concurrent Shenandoah mode for Native Image and I think I arrived at a point where we need additional support in SubstrateVM and/or the GC interface. Shenandoah is using concurrent thread processing since JDK 17 (as introduced by ZGC in JDK 16 with JEP 376: ZGC: Concurrent Thread-Stack Processing), a feature not supported by G1 GC. This requires stack watermark barriers (as described in JEP 376) which themselves require runtime support in the VM and the compiler. From my understanding, SubstrateVM currently doesn't support stack watermark barriers and this feature would have to be implemented first and then exported through the GC interface. Also, in order to process the thread stacks concurrently, Shenandoah uses the For the compiler support, I'm pretty sure that the Graal JIT compiler supports stack watermark barriers because it added support for ZGC in GraalVM 23.0 with [GR-27475] Add ZGC support. I'm however unsure if this also applies to the Graal compiler in AOT mode as used for creating native images? This also reveals another question regarding the compiler support for Shenandoah. As you might recall, we added support for Shenandoah to the Graal JIT with Shenandoah support but again I'm not sure if this also works in AOT mode or if this requires additional changes? For the Shenandoah compiler support I think the question is mostly about emitting the correct barriers when compiling a native image for Shenandoah (notice that passive mode doesn't require barriers, that's probably why it currently works in passive mode independently of the compiler support). Kindly awaiting your thoughts and recommendations, |
At the moment, this is something that is not supported in Native Image. Is it correct that OpenJDK uses the safepoint mechanism for that? I think that won't be possible for Native Image because this also needs to work for code without safepoints (i.e., uninterruptible code doesn't contain any safepoints but may have references to objects in the collected Java heap). Besides that, I guess that all the stack walking logic (including the exception handling) needs to take that into account as well? Let's assume that we are in a VM operation and walking the stack of another thread. Which code would need to be executed so that the stack walking can continue past the watermark? Maybe you can point me to some of the OpenJDK code. Then, I can try to figure out what is needed in Native Image.
SMR is also not supported at the moment. Outside of a safepoint, it is currently only possible to iterate over all threads if you acquire So, depending on what you need to do exactly, larger refactorings may be needed.
This will require some work but it should be fairly straight forwarded because you can reuse a lot of logic from the JIT barriers:
|
|
I also briefly tried your current version locally and ran a few of our internal tests. A heavily multi-threaded application that additionally does a lot of object pinning finishes execution successfully, which is great. For various other workloads, I am seeing the following assertion failure at run-time though with a I also had a brief look at the commits and the necessary SVM-specific changes in the C++ code:
|
|
Hi @christianhaeubl, Thanks a lot for providing reproducers for the problems you've mentioned above and on Slack. I'll attach your tests and test instructions here for better visibility.
|
|
I fixed the first problem (i.e. |
|
Hi @christianhaeubl, The problem with the deadlock in yout For Shenandoah, the following happens: So the application thread creates a The Shenandoah Control Thread gets notified by the change of the I now have the following questions:
|
|
Hi @simonis, I did a brief test with G1 and Some VM operations in Native Image allocate Java heap memory. So, all GCs must at least support implicitly triggered collections during VM operations. For Shenandoah, something like the following might work: Before the Shenandoah control thread enqueues a VM operation, it needs to check if the VM operation thread is blocked in
I think this can be implemented on the C++ side, without needing any Java changes. It might be possible to simplify that approach a bit, depending on the exact code that the Shenandoah control thread needs to execute.
I think that part isn't correct because the application thread releases the mutex before it blocks (see
I don't think that we support that at the moment. You can probably compile the tests by adding the SVM-internal code (e.g., the VM operation infrastrcuture) to the classpath but that is also nothing that I ever tried. |
|
So here's how you can build and run these tests "standalone" (mainly for my personal use):
import com.oracle.svm.core.heap.VMOperationInfos;
import com.oracle.svm.core.thread.JavaVMOperation;
public class GraalShenandoahDeadlock {
public static void main(String[] args) {
TriggerGCOperation triggerGcOp = new TriggerGCOperation();
triggerGcOp.enqueue();
}
private static class TriggerGCOperation extends JavaVMOperation {
protected TriggerGCOperation() {
/* Please also test with SystemEffect.SAFEPOINT. */
super(VMOperationInfos.get(TriggerGCOperation.class, "Trigger GC outside safepoint", SystemEffect.NONE));
}
@Override
protected void operate() {
System.gc();
}
}
}
$ javac -cp svm.jar GraalShenandoahDeadlock.java
$ native-image -o GraalShenandoahDeadlock.exe \
--add-exports=org.graalvm.nativeimage.builder/com.oracle.svm.core.thread=ALL-UNNAMED \
--add-exports=org.graalvm.nativeimage.builder/com.oracle.svm.core.heap=ALL-UNNAMED \
GraalShenandoahDeadlockIt is crucial to export This can't be fixed even by forcing |
@simonis I would expect you to get a build-time error message if you build with the |
Thanks @olpaw , you are right, when using Also, just out of interest, why doesn't |
It sure does! But without |
|
@olpaw, I don't want to be picky, and maybe I'm not fully understanding this, but what you say is that with But that's not what I'm observing. If what you say would be the case, then I'd expect to see the actual root cause of the failure (i.e. Once again, I think that |
|
@christianhaeubl, I have now fixed the reference handling issue as well (see this commit in my labs-jdk Shenandoah branch). I'm still struggling with the deadlock due to nested VM operations. Will try again now. |
If a class is seen as reachable by the static analysis For that to happen the image builder calls If this does not work as described above feel free to create a ticket. cc @cstancu @vjovanov |
|
Hi @christianhaeubl, I agonized over this problem, but I couldn't find a solution. Especially, I can't understand what you mean by:
How can the Shenandoah ControlThread notify the VM thread which is blocked in Shenandoah code? The only way to do that is to notify the MonitorLocker ml(&_gc_waiters_lock);
size_t current_gc_id = get_gc_id();
size_t required_gc_id = current_gc_id + 1;
while (current_gc_id < required_gc_id && !should_terminate()) {
_requested_gc_cause = cause;
_gc_requested.set();
ml.wait();
current_gc_id = get_gc_id();
}I.e. how can the VM operations thread detect and execute a new, pending VM operation while already blocked in VM operation in native Shenandoah code (before finishing it)? Moreover, as far as I can see, native VM operations from Shenandoah are enqueued from public void enqueueFromNonJavaThread(NativeVMOperation operation, NativeVMOperationData data) {
mainQueues.enqueueUninterruptibly(operation, data);
}whereas support for recursive VM operations is only implemented in private void enqueue(VMOperation operation, NativeVMOperationData data) {
StackOverflowCheck.singleton().makeYellowZoneAvailable();
try {
if (mayExecuteVmOperations()) {
// a recursive VM operation (either triggered implicitly or explicitly) -> execute
// it right away
immediateQueues.enqueueAndExecute(operation, data);
} else if (useDedicatedVMOperationThread()) {
...I'm completely lost... |
|
I hope I am not missing anything, but my main point is that you need to directly communicate some information between the VM operation thread and the Shenandoah control thread. Below are more details on what I suggested above, this time a bit closer to an actual implementation. Most of what I am describing is new logic that you will need to add (C++ code only).
|
Provides infrastructure that simplifies the integration of HotSpot garbage collectors (like Shenandoah) into Native Image. Each commit in this PR can be reviewed on its own.