You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A poorly behaving BPF scheduler can live-lock the system by e.g. incessantly
banging on the same DSQ on a large NUMA system to the point where switching
to the bypass mode can take a long time. Turning on the bypass mode requires
dequeueing and re-enqueueing currently runnable tasks, if the DSQs that they
are on are live-locked, this can take tens of seconds cascading into other
failures. This was observed on 2 x Intel Sapphire Rapids machines with 224
logical CPUs.
Inject artifical delays while the bypass mode is switching to guarantee
timely completion.
While at it, move __scx_ops_bypass_lock into scx_ops_bypass() and rename it
to bypass_lock.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Valentin Andrei <[email protected]>
Reported-by: Patrick Lu <[email protected]>
0 commit comments