Skip to content

Comments

Stabilize TestSharedCloseJvmti by avoiding latch park#22

Draft
babsingh wants to merge 1 commit intoibmruntimes:openj9from
babsingh:openj9
Draft

Stabilize TestSharedCloseJvmti by avoiding latch park#22
babsingh wants to merge 1 commit intoibmruntimes:openj9from
babsingh:openj9

Conversation

@babsingh
Copy link
Member

Update TestSharedCloseJvmti to remove CountDownLatch.await() from the
JVMTI re-entrant callback path and replace it with a spin wait on a
volatile flag.

The previous test parked the Trigger thread while executing re-entrantly
from a JVMTI MethodExit callback during a scoped access. On OpenJ9,
shared scope close relies on async polling (javaCheckAsyncMessages) for
close-scope acknowledgement. Parking the thread in this path can prevent
polling and cause intermittent hangs.

The test intent is to exercise additional stack frames during scoped
access, not thread parking inside JVMTI callbacks. Using a spin loop
keeps the thread runnable and preserves the stack-shape stress scenario
while avoiding the hang.

Scoped close/exception handling differs across JVMs. The RI uses a
per-thread handshake and deopt-based approach, while OpenJ9 uses a
flag-and-polling model. This change keeps the test portable across both.

Remove the unused TARGET_LATCH and add a volatile CLOSED flag for
cross-thread visibility.

Related: eclipse-openj9/openj9#22934

@keithc-ca
Copy link
Member

I don't see how this helps. It is the main thread that calls TARGET_LATCH.countDown(); nothing should prevent that from occurring. If a thread blocked on TARGET_LATCH.await() is not allowed to proceed, something else is broken.

@babsingh
Copy link
Member Author

babsingh commented Feb 13, 2026

There is implicit code not visible in the testcase:

try (Arena arena = Arena.ofShared()) {
    ...
}

// Notify trigger thread that arena was closed
TARGET_LATCH.countDown();

This expands to:

Arena arena = Arena.ofShared();
try {
    ...
} finally {
    // arena.close() runs here when the try-with-resources block exits.
    //
    // On OpenJ9, shared scope close sets a close-scope flag and relies on
    // the target thread to process it via the async exception/check path.
    // The closing thread waits until the target thread acknowledges the
    // close by processing the async exception state.
    //
    // In this test, the Trigger thread is blocked in TARGET_LATCH.await()
    // while executing re-entrantly from a JVMTI callback. Because it is
    // parked there, it does not reach the async check path that delivers
    // the scoped async exception and records the acknowledgement.
    //
    // As a result, arena.close() can block here, and the next line of code
    // is never reached.
    arena.close();
}

// Notify trigger thread that arena was closed
TARGET_LATCH.countDown();

The main thread is blocked inside arena.close() before it can reach TARGET_LATCH.countDown(). By replacing TARGET_LATCH.await() with a spin loop, the Trigger thread remains runnable and is able to reach the async check path, process the scoped async exception, and acknowledge the close request. That allows arena.close() to proceed and avoids the hang.

@babsingh babsingh marked this pull request as ready for review February 13, 2026 22:16
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Feb 13, 2026
It is fixed by the below PRs.

Extension Repo PRs (Test fix):
- JDK-next: ibmruntimes/openj9-openjdk-jdk#1174
- JDK26: ibmruntimes/openj9-openjdk-jdk26#22

OpenJ9 PR (VM/JIT fix): eclipse-openj9/openj9#23367

Related: eclipse-openj9/openj9#22934

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Feb 13, 2026
TestSharedCloseJvmti.java is fixed by the below PRs.

Extension Repo PRs (Test fix):
- JDK-next: ibmruntimes/openj9-openjdk-jdk#1174
- JDK26: ibmruntimes/openj9-openjdk-jdk26#22

OpenJ9 PR (VM/JIT fix): eclipse-openj9/openj9#23367

Related: eclipse-openj9/openj9#22934

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
@keithc-ca
Copy link
Member

// As a result, arena.close() can block here, and the next line of code
// is never reached.
arena.close();

areana.close() should be analogous to free() in that it may involve critical sections, but it should never block. If it blocks, it makes no difference whether TARGET_LATCH.countDown(); or CLOSED = true; is not executed.

// On OpenJ9, shared scope close sets a close-scope flag and relies on
// the target thread to process it via the async exception/check path.
// The closing thread waits until the target thread acknowledges the
// close by processing the async exception state.

This is the issue that must be fixed. The closing threads must always make progress.

@keithc-ca keithc-ca marked this pull request as draft February 17, 2026 11:37
@babsingh
Copy link
Member Author

babsingh commented Feb 17, 2026

The test has been updated while preserving its original intent.

arena.close() should be analogous to free() ... it should never block.

This is not a requirement for the API. close is a coordination operation, not a raw deallocation. It must preserve the safety invariant that a shared session cannot be closed while another thread may still be performing an access that depends on it. Cross-thread coordination can therefore block.

The closing threads must always make progress.

Also not an API requirement. It would be a robustness improvement, but would effectively require a different close protocol (handshake-style) rather than the current flag + polling model, and may still have gaps under certain JVMTI behavior. @gacholio @tajila, thoughts on value+feasibility of this in the long run? This is infeasible for JDK26.

@gacholio
Copy link
Member

Not entirely sure what you're asking - you say close must block if other threads are in the scope. Changing the handshake method will not change this.

@gacholio
Copy link
Member

Is this close a public API or only used internally by the implementation?

@keithc-ca
Copy link
Member

It is public API; see java.lang.foreign.Arena.

Update TestSharedCloseJvmti to remove CountDownLatch.await() from the
JVMTI re-entrant callback path and replace it with a spin wait on a
volatile flag.

The previous test parked the Trigger thread while executing re-entrantly
from a JVMTI MethodExit callback during a scoped access. On OpenJ9,
shared scope close relies on async polling (javaCheckAsyncMessages) for
close-scope acknowledgement. Parking the thread in this path can prevent
polling and cause intermittent hangs.

The test intent is to exercise additional stack frames during scoped
access, not thread parking inside JVMTI callbacks. Using a spin loop
keeps the thread runnable and preserves the stack-shape stress scenario
while avoiding the hang.

Scoped close/exception handling differs across JVMs. The RI uses a
per-thread handshake and deopt-based approach, while OpenJ9 uses a
flag-and-polling model. This change keeps the test portable across both.

Remove the unused TARGET_LATCH and add a volatile CLOSED flag for
cross-thread visibility.

Related: eclipse-openj9/openj9#22934

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
@babsingh
Copy link
Member Author

babsingh commented Feb 20, 2026

I have addressed @gacholio's feedback on the comment.

At this point, I do not see a clear path to enforcing a strict never block or always make progress requirement without weakening the safety guarantees such as preventing use-after-close. If there is specific guidance on how to
achieve that while preserving correctness, I would be happy to explore it.

Given the JDK26 release timeline in the first half of Mar 26, I would appreciate any direction on how best to move this PR forward.

@keithc-ca
Copy link
Member

Weakening the test is not the solution. Other than the use of JVMTI to forcibly arrange that one thread is actively using an Arena while another wants to close it, I see nothing special here. User applications might find themselves in similar situations, but perhaps only rarely.

Perhaps the "close coordination" in the VM is deficient. If threads affected by the closure of an arena are live, they should respond in a timely manner to the asynchronous event. If they are blocked/parked (e.g. within CountDownLatch.await()), then they should be interrupted/unparked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants