Skip to content

Conversation

@vogella
Copy link
Contributor

@vogella vogella commented Oct 22, 2025

The flaky test was caused by a race condition in
RCPTestWorkbenchAdvisor.java. The test spawns 4 background threads during preStartup() that schedule async/sync runnables on the display.

However, there was no guarantee these runnables would execute before postStartup() marked the workbench as "started" (by setting started = true).

This meant asyncWithDisplayAccess could still be null when the test assertions ran, causing intermittent failures.

This change adds a CountDownLatch to ensure all 4 async/sync operations complete before marking the workbench as started:

Key changes to
/tests/org.eclipse.ui.tests.harness/src/org/eclipse/ui/tests/harness/util/RCPTestWorkbenchAdvisor.java:

  1. Added latch field (line 48):
    - private static java.util.concurrent.CountDownLatch asyncLatch = null;
  2. Initialize latch in preStartup() (line 130):
    - asyncLatch = new java.util.concurrent.CountDownLatch(4); — tracks 4 operations
  3. Count down after each operation completes:
    - In setupSyncDisplayThread(): Added finally block (lines 179-183) that calls asyncLatch.countDown()
    - In setupAsyncDisplayThread(): Added finally block (lines 205-209) that calls asyncLatch.countDown()
  4. Wait for operations in postStartup() (lines 222-235):
    - Waits up to 5 seconds for all operations to complete
    - Logs warnings if timeout occurs or thread is interrupted
    - Only marks started = true after all operations complete

Fixes #1517

@eclipse-platform-bot
Copy link
Contributor

eclipse-platform-bot commented Oct 22, 2025

This pull request changes some projects for the first time in this development cycle.
Therefore the following files need a version increment:

tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF

An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch.

Git patch
From 8ddb987c144a6a8d8bbc8ced0600c5384df37c3c Mon Sep 17 00:00:00 2001
From: Eclipse Platform Bot <[email protected]>
Date: Tue, 4 Nov 2025 21:42:08 +0000
Subject: [PATCH] Version bump(s) for 4.38 stream


diff --git a/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF b/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF
index 259fee3e99..8cdb2f2c5f 100644
--- a/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF
+++ b/tests/org.eclipse.ui.tests.harness/META-INF/MANIFEST.MF
@@ -2,7 +2,7 @@ Manifest-Version: 1.0
 Bundle-ManifestVersion: 2
 Bundle-Name: Harness Plug-in
 Bundle-SymbolicName: org.eclipse.ui.tests.harness;singleton:=true
-Bundle-Version: 1.10.600.qualifier
+Bundle-Version: 1.10.700.qualifier
 Eclipse-BundleShape: dir
 Require-Bundle: org.eclipse.ui,
  org.eclipse.core.runtime,
-- 
2.51.2

Further information are available in Common Build Issues - Missing version increments.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 22, 2025

Test Results

 3 018 files  ±0   3 018 suites  ±0   2h 51m 6s ⏱️ + 33m 0s
 8 234 tests ±0   7 984 ✅  - 1  249 💤 ±0  1 ❌ +1 
23 622 runs  ±0  22 825 ✅  - 3  794 💤 ±0  3 ❌ +3 

For more details on these failures, see this check.

Results for commit 7ae06e6. ± Comparison against base commit 9c171a0.

♻️ This comment has been updated with latest results.

Comment on lines 228 to 230
if (!completed) {
System.err.println("Warning: Not all async/sync operations completed within timeout");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would let it fail. I'd rather have a test that fails consistently than one that might fail even if completed == false.

Suggested change
if (!completed) {
System.err.println("Warning: Not all async/sync operations completed within timeout");
}
assertTrue(completed, "Not all async/sync operations completed within timeout");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it seems I missed an important part: when this fails, the workbench is already created and therefore subsequent attempts to create it will fail too.

After the latest changes, the logs show that once this happens, a bunch of tests fail with the error Workbench already exists and cannot be created again

Here are the first 3 failures.

Stack traces

org.eclipse.ui.tests.rcp.PlatformUITest.testCreateAndRunWorkbench -- Time elapsed: 7.070 s <<< FAILURE!
java.lang.AssertionError: Not all async/sync operations completed within timeout
	at org.eclipse.ui.tests.harness.util.RCPTestWorkbenchAdvisor.postStartup(RCPTestWorkbenchAdvisor.java:221)
	at org.eclipse.ui.tests.rcp.util.WorkbenchAdvisorObserver.postStartup(WorkbenchAdvisorObserver.java:140)
	at org.eclipse.ui.internal.Workbench.lambda$18(Workbench.java:2901)
	at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$5.run(PartRenderingEngine.java:1104)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:339)
	at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1038)
	at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:153)
	at org.eclipse.ui.internal.Workbench.lambda$3(Workbench.java:677)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:339)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:583)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:173)
	at org.eclipse.ui.tests.rcp.PlatformUITest.testCreateAndRunWorkbench(PlatformUITest.java:57)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)

org.eclipse.ui.tests.rcp.PlatformUITest.testCreateAndRunWorkbenchWithExceptionOnStartup skipped
Running RcpTestSuite WorkbenchAdvisorTest
Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.428 s <<< FAILURE! -- in RcpTestSuite WorkbenchAdvisorTest
org.eclipse.ui.tests.rcp.WorkbenchAdvisorTest.testEmptyProgressRegion -- Time elapsed: 0.191 s <<< ERROR!
java.lang.IllegalStateException: Workbench already exists and cannot be created again.
	at org.eclipse.ui.internal.Workbench.<init>(Workbench.java:481)
	at org.eclipse.ui.internal.Workbench.lambda$3(Workbench.java:609)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:339)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:583)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:173)
	at org.eclipse.ui.tests.rcp.WorkbenchAdvisorTest.testEmptyProgressRegion(WorkbenchAdvisorTest.java:293)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)

org.eclipse.ui.tests.rcp.WorkbenchAdvisorTest.testFillAllActionBar -- Time elapsed: 0.031 s <<< ERROR!
java.lang.IllegalStateException: Workbench already exists and cannot be created again.
	at org.eclipse.ui.internal.Workbench.<init>(Workbench.java:481)
	at org.eclipse.ui.internal.Workbench.lambda$3(Workbench.java:609)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:339)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:583)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:173)
	at org.eclipse.ui.tests.rcp.WorkbenchAdvisorTest.testFillAllActionBar(WorkbenchAdvisorTest.java:279)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)

The 1st failure is due to the timeout (Not all async/sync operations completed within timeout) and the workbench is not cleaned-up/shutdown. The following failing tests also fail with Workbench already exists and cannot be created again..

The way I see it, we need 2 things:

  1. See why the timeout occurs (is it "just" a matter of waiting a bit longer since the hardware is old and slow or is it something else?)
  2. Clean-up if the timeout occurs so that the following tests may start the workbench again.

I don't have any novel ideas at the moment :-\ and you @vogella ?

@vogella vogella force-pushed the flacky-test-RCP-test-advisor branch 3 times, most recently from bebe15f to cd9113d Compare October 23, 2025 14:38
Copy link
Contributor

@HeikoKlare HeikoKlare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks sound. Thank you for taking care of that long-standing failing test. I just have minor comments regarding a potential improvement/simplification.

Comment on lines 230 to 245
Display display = Display.getCurrent();
if (display != null) {
// Process pending async events multiple times to ensure they execute
for (int i = 0; i < 10; i++) {
while (display.readAndDispatch()) {
// Keep processing events
}
// Small yield to allow any final async operations to complete
try {
Thread.sleep(10);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this quite complex event processing really necessary? All relevant runnables are guaranteed to be scheduled here, so display.readAndDispatch() should not yield false until they all have been processed. You just need to make sure that those runnables that are not supposed to be executed during startup had the chance to accidentally be executed (to detect regression). So wouldn't be something like this be sufficient?

Suggested change
Display display = Display.getCurrent();
if (display != null) {
// Process pending async events multiple times to ensure they execute
for (int i = 0; i < 10; i++) {
while (display.readAndDispatch()) {
// Keep processing events
}
// Small yield to allow any final async operations to complete
try {
Thread.sleep(10);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
UITestUtil.processEventsUntil(() -> syncWithDisplayAccess && asyncWithDisplayAccess, 5000);
UITestUtil.processEvents();

private static boolean started = false;

// Use a CountDownLatch to ensure async operations complete before postStartup
private static final CountDownLatch asyncLatch = new CountDownLatch(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wouldn't it be cleaner to use something like a Phaser and make every thread register itself and then deregister/arrive when being executed (like now done with asyncLatch.countDown()), so that the latch's count does not need to magically match the number of thread created?

@vogella vogella force-pushed the flacky-test-RCP-test-advisor branch from cd9113d to 13a3215 Compare November 3, 2025 15:12
vogella added a commit to vogella/eclipse.platform.ui that referenced this pull request Nov 4, 2025
After merging the Phaser-based synchronization from PR eclipse-platform#3429, apply the
critical fix: the processEventsUntil condition was checking if
syncWithDisplayAccess and asyncWithDisplayAccess were not null, but the
test expects these values to be TRUE.

If the async operations executed after started=true is set, they would be
set to FALSE (because !isSTARTED() would be false), causing the test to
fail intermittently.

Changed the condition to wait until both values are TRUE using
Boolean.TRUE.equals() instead of just checking for null.

This ensures the workbench only marks itself as started after the async
operations have completed AND set the expected TRUE values, preventing
the flaky test failures reported in eclipse-platform#1517.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Introduce a Phaser to ensure async/sync operations are scheduled
before postStartup marks the workbench as started. This prevents
timing issues where test variables (asyncWithDisplayAccess,
syncWithDisplayAccess, etc.) might not be set correctly due to
the workbench being marked as started before async runnables execute.

Changes:
- Add Phaser to coordinate async/sync thread operations
- Register/deregister threads when scheduling display operations
- Wait for all operations to be scheduled in postStartup (with 5s timeout)
- Process event loop to ensure async runnables execute before marking started
- Simplify Boolean assignments (remove unnecessary ternary operators)

This ensures deterministic behavior in test harness initialization
and prevents flaky test failures due to race conditions.
@vogella vogella force-pushed the flacky-test-RCP-test-advisor branch from 537a6d4 to 2a31ddf Compare November 4, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random failing PlatformUITest#testCreateAndRunWorkbench()

4 participants