Skip to content

Conversation

@agoscinski
Copy link
Owner

No description provided.

@agoscinski agoscinski force-pushed the killing-time branch 2 times, most recently from f76c985 to 2e78d73 Compare May 7, 2025 21:39
When `uv` creates a virtual environment, it will not automatically activate it. 
However this is not a problem, since the subsequent commands in `.readthedocs.yml`, meaning  `uv sync` & `uv run` both pick up the created environment automatically.

This commit clarifies the behavior for developers.

---------

Co-authored-by: Daniel Hollas <[email protected]>
The killing process is very convoluted due to being partially performed
in `tasks.py:Waiting` and `process.py:Process`. The architecture tried
to split the killing process in two parts, one responsible for
cancelling the job in the scheduler in (`tasks.py:Waiting`), one
responsible for killing the process transitioning it to the KILLED
state. Here a summary of these two steps

Killing the plumpy
calcjob/process:Process
Event: KillMessage (through rabbitmq by through verdi)
kill -> self.runner.controller.kill_process # (sending message to kill)

Killing the scheduler job
calcjob/tasks:Waiting (The task running the actual CalcJob)
Event: CalcJobMonitorAction.KILL (through monitoring), KillInterrupt (through verdi)
execute --> _kill_job -> task_kill_job -> do_kill -> execmanager.kill_calculation

In this PR I am moving most of the killing logic to the process to
simplify the design. This is required to fix a bug that appears when
two killing commands are sent. The first killing command is sending the
KillInterruption (within `process.py:Process`, part of the logic in
parent class) to the `tasks.py:Waiting` that receives it and start the
cancelling of the scheduler job. Since this is only triggered through a
try-catch block of the `KillInterruption` it cannot be repeated when a
second kill command is invoked by the user. This bug was introduced by
PR TODO (the one introduced force kill), because it also started to fix
the timeout issue (verdi process kill is ignoring the timeout). Moving
all killing logic to the process as done in this PR solves the problem
as we completely moved the cancelation of the job is reinvoked in the
process class. This is the function that is invoked when a worker
receives a kill message through RMQ.

I put very verbose comments for the review that I will remove later. I
must say the kill process seems not well tested as I had not to adapt
much in the tests. The tests in `test_work_chain.py` need some adaption
to also be able to kill a scheduler job in a dummy manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants