Skip to content

Commit 52d6b92

Browse files
ashok-rajKAGA-KOKO
authored andcommitted
x86/hotplug: Silence APIC only after all interrupts are migrated
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: Evan Green <[email protected]> Signed-off-by: Ashok Raj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Mathias Nyman <[email protected]> Tested-by: Evan Green <[email protected]> Reviewed-by: Evan Green <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/r/[email protected]
1 parent d4f0726 commit 52d6b92

File tree

1 file changed

+20
-6
lines changed

1 file changed

+20
-6
lines changed

arch/x86/kernel/smpboot.c

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1594,14 +1594,28 @@ int native_cpu_disable(void)
15941594
if (ret)
15951595
return ret;
15961596

1597-
/*
1598-
* Disable the local APIC. Otherwise IPI broadcasts will reach
1599-
* it. It still responds normally to INIT, NMI, SMI, and SIPI
1600-
* messages.
1601-
*/
1602-
apic_soft_disable();
16031597
cpu_disable_common();
16041598

1599+
/*
1600+
* Disable the local APIC. Otherwise IPI broadcasts will reach
1601+
* it. It still responds normally to INIT, NMI, SMI, and SIPI
1602+
* messages.
1603+
*
1604+
* Disabling the APIC must happen after cpu_disable_common()
1605+
* which invokes fixup_irqs().
1606+
*
1607+
* Disabling the APIC preserves already set bits in IRR, but
1608+
* an interrupt arriving after disabling the local APIC does not
1609+
* set the corresponding IRR bit.
1610+
*
1611+
* fixup_irqs() scans IRR for set bits so it can raise a not
1612+
* yet handled interrupt on the new destination CPU via an IPI
1613+
* but obviously it can't do so for IRR bits which are not set.
1614+
* IOW, interrupts arriving after disabling the local APIC will
1615+
* be lost.
1616+
*/
1617+
apic_soft_disable();
1618+
16051619
return 0;
16061620
}
16071621

0 commit comments

Comments
 (0)