-
Notifications
You must be signed in to change notification settings - Fork 146
SIGUSR1/SIGUSR2 handler clobbering breaking JVM processes #161
Copy link
Copy link
Open
Description
Problem
libvgpu.so registers signal handlers for SIGUSR1 and SIGUSR2 using signal(), which overwrites any previously installed handlers without saving them. This causes JVM processes to crash with SIGSEGV in Monitor::wait() because the JVM uses SIGUSR1/SIGUSR2 internally for GC safepoints and thread management.
Observed on HAMi volcano-vgpu nodes (hami-core mode) when running PyTorch jobs with a JVM component — the crash occurs at startup before CUDA initializes.
Additional risks from the current implementation:
libvgpu.sointerceptsdlsym(), which the JVM also uses for native library loading- The
ENSURE_RUNNING()spin loop can cause a deadlock if a Java thread holding a JVM monitor gets suspended
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels