Skip to content

Commit 25a068b

Browse files
hansendcsuryasaimadhu
authored andcommitted
x86/apic: Add extra serialization for non-serializing MSRs
Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for MFENCE; LFENCE. Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendations. This is not known to cause any issues in practice, only in theory. Longer story below: The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient. Why was the pain MFENCE judged insufficient? WRMSR itself is normally a serializing instruction. No fences are needed because the instruction itself serializes everything. But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs. Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MFENCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place. This means that an x2apic IPI could theoretically be triggered before there is any (visible) data to process. Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident. To be safe, add the SDM-recommended fences for all x2apic WRMSRs. This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say: In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required. But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE. [ bp: Massage commit message, fix typos, drop accidentally added newline to tools/arch/x86/include/asm/barrier.h. ] Reported-by: Jan Kiszka <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent 7f1b11b commit 25a068b

File tree

5 files changed

+32
-15
lines changed

5 files changed

+32
-15
lines changed

arch/x86/include/asm/apic.h

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -197,16 +197,6 @@ static inline bool apic_needs_pit(void) { return true; }
197197
#endif /* !CONFIG_X86_LOCAL_APIC */
198198

199199
#ifdef CONFIG_X86_X2APIC
200-
/*
201-
* Make previous memory operations globally visible before
202-
* sending the IPI through x2apic wrmsr. We need a serializing instruction or
203-
* mfence for this.
204-
*/
205-
static inline void x2apic_wrmsr_fence(void)
206-
{
207-
asm volatile("mfence" : : : "memory");
208-
}
209-
210200
static inline void native_apic_msr_write(u32 reg, u32 v)
211201
{
212202
if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR ||

arch/x86/include/asm/barrier.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,22 @@ do { \
8484

8585
#include <asm-generic/barrier.h>
8686

87+
/*
88+
* Make previous memory operations globally visible before
89+
* a WRMSR.
90+
*
91+
* MFENCE makes writes visible, but only affects load/store
92+
* instructions. WRMSR is unfortunately not a load/store
93+
* instruction and is unaffected by MFENCE. The LFENCE ensures
94+
* that the WRMSR is not reordered.
95+
*
96+
* Most WRMSRs are full serializing instructions themselves and
97+
* do not require this barrier. This is only required for the
98+
* IA32_TSC_DEADLINE and X2APIC MSRs.
99+
*/
100+
static inline void weak_wrmsr_fence(void)
101+
{
102+
asm volatile("mfence; lfence" : : : "memory");
103+
}
104+
87105
#endif /* _ASM_X86_BARRIER_H */

arch/x86/kernel/apic/apic.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
#include <asm/perf_event.h>
4242
#include <asm/x86_init.h>
4343
#include <linux/atomic.h>
44+
#include <asm/barrier.h>
4445
#include <asm/mpspec.h>
4546
#include <asm/i8259.h>
4647
#include <asm/proto.h>
@@ -477,6 +478,9 @@ static int lapic_next_deadline(unsigned long delta,
477478
{
478479
u64 tsc;
479480

481+
/* This MSR is special and need a special fence: */
482+
weak_wrmsr_fence();
483+
480484
tsc = rdtsc();
481485
wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR));
482486
return 0;

arch/x86/kernel/apic/x2apic_cluster.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ static void x2apic_send_IPI(int cpu, int vector)
2929
{
3030
u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu);
3131

32-
x2apic_wrmsr_fence();
32+
/* x2apic MSRs are special and need a special fence: */
33+
weak_wrmsr_fence();
3334
__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
3435
}
3536

@@ -41,7 +42,8 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
4142
unsigned long flags;
4243
u32 dest;
4344

44-
x2apic_wrmsr_fence();
45+
/* x2apic MSRs are special and need a special fence: */
46+
weak_wrmsr_fence();
4547
local_irq_save(flags);
4648

4749
tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);

arch/x86/kernel/apic/x2apic_phys.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,8 @@ static void x2apic_send_IPI(int cpu, int vector)
4343
{
4444
u32 dest = per_cpu(x86_cpu_to_apicid, cpu);
4545

46-
x2apic_wrmsr_fence();
46+
/* x2apic MSRs are special and need a special fence: */
47+
weak_wrmsr_fence();
4748
__x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL);
4849
}
4950

@@ -54,7 +55,8 @@ __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
5455
unsigned long this_cpu;
5556
unsigned long flags;
5657

57-
x2apic_wrmsr_fence();
58+
/* x2apic MSRs are special and need a special fence: */
59+
weak_wrmsr_fence();
5860

5961
local_irq_save(flags);
6062

@@ -125,7 +127,8 @@ void __x2apic_send_IPI_shorthand(int vector, u32 which)
125127
{
126128
unsigned long cfg = __prepare_ICR(which, vector, 0);
127129

128-
x2apic_wrmsr_fence();
130+
/* x2apic MSRs are special and need a special fence: */
131+
weak_wrmsr_fence();
129132
native_x2apic_icr_write(cfg, 0);
130133
}
131134

0 commit comments

Comments
 (0)