Skip to content

Commit 68ea296

Browse files
andrrossshayush622
authored andcommitted
Fix flaky ResourceAwareTasksTests (opensearch-project#20863)
Race condition between request completion and task resource tracking cleanup. The sequence of events: 1. Task is cancelled via `CancelTasksRequest` 2. The node operation throws `TaskCancelledException` 3. The response is sent back to the caller, which counts down `requestCompleteLatch` 4. The test's main thread wakes up from `requestCompleteLatch.await()` and asserts `resourceTasks.size() == 0` 5. Meanwhile, `TaskResourceTrackingService.stopTracking()` (which calls `resourceAwareTasks.remove()`) is invoked asynchronously via a `resourceTrackingCompletionListener` registered in `TaskManager.register()` Steps 4 and 5 race. I was able to reproduce the failure locally using `stess-ng` and verify this fix. Signed-off-by: Andrew Ross <andrross@amazon.com>
1 parent d10224b commit 68ea296

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

server/src/test/java/org/opensearch/action/admin/cluster/node/tasks/ResourceAwareTasksTests.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,9 @@ public void onFailure(Exception e) {
410410
// Waiting for whole request to complete and return successfully till client
411411
taskTestContext.requestCompleteLatch.await();
412412

413-
assertEquals(0, resourceTasks.size());
413+
// The task may not be removed from resourceAwareTasks immediately after the request completes
414+
// because stopTracking is called asynchronously via the task's resource tracking completion listener.
415+
assertBusy(() -> assertEquals(0, resourceTasks.size()));
414416
assertNull(throwableReference.get());
415417
assertNotNull(responseReference.get());
416418
assertEquals(1, responseReference.get().failureCount());

0 commit comments

Comments
 (0)