Skip to content

Commit d7fbf9b

Browse files
committed
fix(ci): epoll() on pidfd to wait for Firecracker exit
Currently, we use psutil.pid_exists in a loop with a timeout of 10 seconds. This is racy and indeed some times we hit it in our CI. Substitute this mechanism with calling epoll() on the pidfd of the process instead. This should deterministically block until the process exits. If there's something else wrong, we will hit the pytest timeout. Signed-off-by: Babis Chalios <[email protected]>
1 parent c00d5ed commit d7fbf9b

File tree

1 file changed

+28
-4
lines changed

1 file changed

+28
-4
lines changed

tests/framework/utils.py

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import os
88
import platform
99
import re
10+
import select
1011
import signal
1112
import stat
1213
import subprocess
@@ -450,16 +451,39 @@ def run_guest_cmd(ssh_connection, cmd, expected, use_json=False):
450451
assert stdout == expected
451452

452453

453-
@retry(wait=wait_fixed(1), stop=stop_after_attempt(10), reraise=True)
454+
def get_process_pidfd(pid):
455+
"""Get a pidfd file descriptor for the process with PID `pid`
456+
457+
Will return a pid file descriptor for the process with PID `pid` if it is
458+
still alive. If the process has already exited it will return `None`.
459+
460+
Any other error while calling the system call, will raise an OSError
461+
exception.
462+
"""
463+
try:
464+
pidfd = os.pidfd_open(pid)
465+
except ProcessLookupError:
466+
return None
467+
468+
return pidfd
469+
470+
454471
def wait_process_termination(p_pid):
455472
"""Wait for a process to terminate.
456473
457-
Will return sucessfully if the process
474+
Will return successfully if the process
458475
got indeed killed or raises an exception if the process
459476
is still alive after retrying several times.
460477
"""
461-
if psutil.pid_exists(p_pid):
462-
raise Exception(f"[{p_pid}] process is still alive")
478+
pidfd = get_process_pidfd(p_pid)
479+
480+
# If pidfd is None the process has already terminated
481+
if pidfd is not None:
482+
epoll = select.epoll()
483+
epoll.register(pidfd, select.EPOLLIN)
484+
# This will return once the process exits
485+
epoll.poll()
486+
os.close(pidfd)
463487

464488

465489
def get_firecracker_version_from_toml():

0 commit comments

Comments
 (0)