move executor.status() API boundary to return the cached/mutated data, not fresh provider+simulated data

benclifford · benclifford · commit d23ecc3c834c · 2024-04-09T20:07:34.000Z
potentially-refresh cache on every call instead of driven by poller - this might change the refresh cadence.

code that assumes that repeated calls to PollItem.status will be constant, except across the poll loop iteration, might break -- does anything make that assumption? for example: handle_errors vs strategize now happen with potentially different status info (that should be eventually convergent)

the monitoring loop in poll now can't see the update happening/can't get a before/after state inside the loop, because its decoupled now.
probably needs some more persistent (across loop iterations) storage of the previous state...

things to test:

* what test makes sure the provider isn't polled too often?
if not here, write one.

* what tests that we don't scale too much?
(eg. if we ignore the PENDING status added by scale_out
to poller_mutable_status, and let the strategy keep
running, then we should see excessive blocks being
launched)

* test monitoring delta recording works


this breaks:
pytest parsl/tests/test_scaling/test_scale_down_htex_unregistered.py --config local

I think it's because the status update hasn't happened yet at the point that its being asserted, because .status() is now cached...
diff --git a/parsl/executors/status_handling.py b/parsl/executors/status_handling.py
@@ -114,7 +114,7 @@ def outstanding(self) -> int:
         raise NotImplementedError("Classes inheriting from BlockProviderExecutor must implement "
                                   "outstanding()")
 
-    def status(self) -> Dict[str, JobStatus]:
+    def _old_status_impl(self) -> Dict[str, JobStatus]:
         """Return the status of all jobs/blocks currently known to this executor.
 
         :return: a dictionary mapping block ids (in string) to job status
@@ -128,6 +128,13 @@ def status(self) -> Dict[str, JobStatus]:
 
         return status
 
+    def status(self) -> Dict[str, JobStatus]:
+        now = time.time()
+        if self._should_poll(now):
+            self._poller_mutable_status = self._old_status_impl()
+            self._last_poll_time = now
+        return self._poller_mutable_status
+
     def set_bad_state_and_fail_all(self, exception: Exception):
         """Allows external error handlers to mark this executor as irrecoverably bad and cause
         all tasks submitted to it now and in the future to fail. The executor is responsible
@@ -242,9 +249,3 @@ def workers_per_node(self) -> Union[int, float]:
 
     def _should_poll(self, now: float) -> bool:
         return now >= self._last_poll_time + self.status_polling_interval
-
-    def _refresh_poll_mutable_status_if_time(self):
-        now = time.time()
-        if self._should_poll(now):
-            self._poller_mutable_status = self.status()
-            self._last_poll_time = now
diff --git a/parsl/jobs/job_status_poller.py b/parsl/jobs/job_status_poller.py
@@ -20,9 +20,7 @@ def __init__(self, executor: BlockProviderExecutor, monitoring: Optional["parsl.
         self._monitoring = monitoring
 
     def poll(self) -> None:
-        previous_status = self.executor._poller_mutable_status
-
-        self._executor._refresh_poll_mutable_status_if_time()
+        previous_status = self._executor.status()
 
         if previous_status != self.executor._poller_mutable_status:
             # short circuit the case where the two objects are identical so
@@ -50,7 +48,7 @@ def status(self) -> Dict[str, JobStatus]:
 
         :return: a dictionary mapping block ids (in string) to job status
         """
-        return self._executor._poller_mutable_status
+        return self._executor.status()
 
     @property
     def executor(self) -> BlockProviderExecutor: