Skip to content

Conversation

@agoscinski
Copy link
Contributor

@agoscinski agoscinski commented Jun 2, 2025

The original design of wait and timeout was to distinguish between actions that immediately return and actions that scheduled. This mechanism was however never used and resulted in an misinterpretation in the force-kill PR #6793 introducing a bug fixed in PR #6870. The mechanism of wait and timeout was also never correctly implemented. In this PR we rectify the logic and simplify it by handling immediate and scheduled actions the same way.

My interpretation is that the original logic (seen in commit 8388018) is that it unwraps the future once, if it is a another future then it is a scheduled action otherwise an immediate one. Then 1b6ecb8 in plumpy introduced a double wrapping for play, pause and kill of the return value which I think is correct as these are scheduled actions. The problem appears in cd0d15c where on the aiida-core side in the first round unwrapping is always done till the actual (nonfuture) result by using unwrap_kiwi_future. This makes the usage of wait completely obsolete as we always get the final nonfuture result in the first step. Also the timeout was not passed to the result which made it a blocking action for scheduled tasked (the timeout is only applied on the first layer of future in futures.as_completed which is meaningless since the first layer is always a future and returns immediately).

This is fixed by simplifying the two step procedure, unwrap once and if future unwrap again, to one step, unwrap until nonfuture result gained with timeout. One can specify a timeout <= 0 to express that the action should not wait for a response while one can specify timeout==float('inf') (default value) to wait until a response has been received without a timeout.

@codecov
Copy link

codecov bot commented Jun 4, 2025

Codecov Report

❌ Patch coverage is 83.78378% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.63%. Comparing base (5167e2c) to head (c4f67f1).
⚠️ Report is 43 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/engine/processes/control.py 79.32% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6902      +/-   ##
==========================================
+ Coverage   78.59%   78.63%   +0.04%     
==========================================
  Files         564      564              
  Lines       43128    43116      -12     
==========================================
+ Hits        33891    33898       +7     
+ Misses       9237     9218      -19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@agoscinski agoscinski force-pushed the fix/wait-timeout branch 2 times, most recently from bcb85d7 to cfbc5bd Compare June 4, 2025 20:04
@agoscinski agoscinski requested a review from unkcpz June 5, 2025 05:13
@agoscinski agoscinski marked this pull request as ready for review June 5, 2025 05:13
@agoscinski agoscinski changed the title Merge wait and timeout to one CLI option for verdi process {kill|play|pause} Merge wait and timeout to one CLI option for verdi process {kill|play|pause} Jun 5, 2025
This function will echo the correct information strings based on the outcomes of the futures and the given verb
conjugations. You can optionally wait for any pending actions to be completed before the functions returns and use a
timeout to put a maximum wait time on the actions.
timeout to put a maximum wait time on the actions. TODO fix docstring
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget it

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think after staring on this part 1 hour, I understand a bit more clear. The code you removed is a recursion call that more or less did the same thing as calling unwrap_kiwi_future and then call result(). I think you did it right, when I add the unwrap_kiwi_future to this function, it was just a workaround to get the wrapped future resolved because of LoopCommunicator put another layer of future to transfer the result over the threads. I admit I didn't pay attention to the after part which originally designed to get the future resolved.

EDIT: I still not sure it is correct. I overlooked the indentation on the code you removed. I think the behavior of calling unwrap.result() in every loop of process iter is different from using as_complete for all futures. What makes me confused was result = future.result() was there in from beginning, it should doing the same thing as using as_completed to wait futures as a bundle.

@agoscinski let's have a chat next week, I need fresh my head a bit, it hurts.

result = unwrapped.result(timeout=None if timeout == float('inf') else timeout)
except communications.TimeoutError:
cancelled = unwrapped.cancel()
cancelled = future.cancel()
Copy link
Member

@unkcpz unkcpz Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct here, the future is different from unwrapped.
Let's image the results are warped in 3 layers of future object, to get the result of final result from future requires future.result().result().result() but after unwrap to a flat future which is unwrapped, it requires only unwrapped.result(). It is the same for the cancel().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I think I am wrong here, the cancellation is taken care inside unwrap_kiwi_future where doing the cancellation when the inner most future is claim to be cancelled.

The original design of `wait` and `timeout` was to distinguish between
actions that immediately return and actions that scheduled. This
mechanism was however never used and resulted in an misinterpretation
in the force-kill PR aiidateam#6793 introducing a bug fixed in PR aiidateam#6870.
The mechanism of `wait` and `timeout` was also never correctly implemented.
In this PR we rectify the logic and simplify it by handling immediate
and scheduled actions the same way.

Related commits in aiida-core 8388018, cd0d15c and plumpy 1b6ecb8

One can specifiy a `timeout <= 0` to express that the action should not wait for
a response while one can specify `timeout==float('inf')` (default value) to wait
until a response has been received without a timeout.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Jun 11, 2025
The original design of `wait` and `timeout` was to distinguish between
actions that immediately return and actions that scheduled. This
mechanism was however never used and resulted in an misinterpretation
in the force-kill PR aiidateam#6793 introducing a bug fixed in PR aiidateam#6870.
The mechanism of `wait` and `timeout` was also never correctly implemented.
In this PR we rectify the logic and simplify it by handling immediate
and scheduled actions the same way.

Related commits in aiida-core 8388018, cd0d15c and plumpy 1b6ecb8

One can specifiy a `timeout <= 0` to express that the action should not wait for
a response while one can specify `timeout==float('inf')` (default value) to wait
until a response has been received without a timeout.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Jun 11, 2025
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_completed is what I thought should not removed, although it might not correctly reached in the original implementation. Thanks for adding back.
It is now looks all good to me.

This function will echo the correct information strings based on the outcomes of the futures and the given verb
conjugations. You can optionally wait for any pending actions to be completed before the functions returns and use a
timeout to put a maximum wait time on the actions.
timeout to put a maximum wait time on the actions. TODO fix docstring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget this.

@agoscinski agoscinski merged commit ec9d53e into aiidateam:main Jun 11, 2025
11 checks passed
@agoscinski agoscinski deleted the fix/wait-timeout branch June 11, 2025 12:21
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Jun 17, 2025
…|play|pause}` (aiidateam#6902)

The original design of `wait` and `timeout` was to distinguish between actions
that immediately return and actions that scheduled. This mechanism was however
never used and resulted in an misinterpretation in the force-kill PR aiidateam#6793
introducing a bug fixed in PR $6870. The mechanism of `wait` and `timeout` was
also never correctly implemented. In this PR we rectify the logic and simplify
it by handling immediate and scheduled actions the same way.

My interpretation is that the original logic (seen in commit 8388018) is that it
unwraps the future once, if it is a another future then it is a scheduled action
otherwise an immediate one. Then 1b6ecb8 in plumpy introduced a double wrapping
for `play`, `pause` and `kill` of the return value which I think is correct as
these are scheduled actions. The problem appears in cd0d15c where on the
aiida-core side in the first round unwrapping is always done till the actual
(nonfuture) result by using `unwrap_kiwi_future`. This makes the usage of
`wait` completely obsolete as we always get the final nonfuture result in the
first step. Also the `timeout` was not passed to the `result` which made it a
blocking action for scheduled tasked (the `timeout` is only applied on the
first layer of future in `futures.as_completed` which is meaningless since the
first layer is always a future and returns immediately).

This is fixed by simplifying the two step procedure, unwrap once and if future
unwrap again, to one step: Unwrap until nonfuture result gained with timeout.
One can specify a `timeout == 0` to express that the action should not wait for
a response while one can specify `timeout==float('inf')` (default value) to wait
until a response has been received without a timeout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Clarify wait and timeout in CLI play, pause, kill commands

2 participants