Don't strategize executors in bad state (#3994)

benclifford · web-flow · commit 068576547f2a · 2025-10-17T12:15:42.000Z
This is consistent with JobStatusPoller.close, which does not scale in executors in bad state. See issue #3992 for more context. This fixes a race condition that I can create by adding fuzz delays (as described in #3992) and that appeared (without adding fuzz timing) in PR #3991 probably due to reduced time taken per strategy iteration in that PR. The new use of `bad_state_is_set` is prone to race conditions - that's part of the model of how bad_state_is_set works everywhere in the codebase, and is briefly discussed in PR #3995. # Changed Behaviour Scaling will not happen for bad-state executors. I think this is always the right thing to do. ## Type of change - Bug fix
diff --git a/parsl/jobs/strategy.py b/parsl/jobs/strategy.py
@@ -185,6 +185,11 @@ def _general_strategy(self, executors: List[BlockProviderExecutor], *, strategy_
 
         for executor in executors:
             label = executor.label
+
+            if executor.bad_state_is_set:
+                logger.info(f"Not strategizing for executor {label} because bad state is set")
+                continue
+
             logger.debug(f"Strategizing for executor {label}")
 
             if self.executors[label]['first']:
diff --git a/parsl/tests/test_scaling/test_regression_3696_oscillation.py b/parsl/tests/test_scaling/test_regression_3696_oscillation.py
@@ -51,6 +51,7 @@ def test_htex_strategy_does_not_oscillate(ns):
     executor.outstanding = lambda: n_tasks
     executor.status_facade = statuses
     executor.workers_per_node = n_workers
+    executor.bad_state_is_set = False
 
     provider.parallelism = 1
     provider.init_blocks = 0