Skip to content

Commit 4368c4b

Browse files
committed
Merge branch 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull pti updates from Thomas Gleixner: "The performance deterioration departement is not proud at all to present yet another set of speculation fences to mitigate the next chapter in the 'what could possibly go wrong' story. The new vulnerability belongs to the Spectre class and affects GS based data accesses and has therefore been dubbed 'Grand Schemozzle' for secret communication purposes. It's officially listed as CVE-2019-1125. Conditional branches in the entry paths which contain a SWAPGS instruction (interrupts and exceptions) can be mis-speculated which results in speculative accesses with a wrong GS base. This can happen on entry from user mode through a mis-speculated branch which takes the entry from kernel mode path and therefore does not execute the SWAPGS instruction. The following speculative accesses are done with user GS base. On entry from kernel mode the mis-speculated branch executes the SWAPGS instruction in the entry from user mode path which has the same effect that the following GS based accesses are done with user GS base. If there is a disclosure gadget available in these code paths the mis-speculated data access can be leaked through the usual side channels. The entry from user mode issue affects all CPUs which have speculative execution. The entry from kernel mode issue affects only Intel CPUs which can speculate through SWAPGS. On CPUs from other vendors SWAPGS has semantics which prevent that. SMAP migitates both problems but only when the CPU is not affected by the Meltdown vulnerability. The mitigation is to issue LFENCE instructions in the entry from kernel mode path for all affected CPUs and on the affected Intel CPUs also in the entry from user mode path unless PTI is enabled because the CR3 write is serializing. The fences are as usual enabled conditionally and can be completely disabled on the kernel command line. The Spectre V1 documentation is updated accordingly. A big "Thank You!" goes to Josh for doing the heavy lifting for this round of hardware misfeature 'repair'. Of course also "Thank You!" to everybody else who contributed in one way or the other" * 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation: Add swapgs description to the Spectre v1 documentation x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS x86/entry/64: Use JMP instead of JMPQ x86/speculation: Enable Spectre v1 swapgs mitigations x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
2 parents 0eb0ce0 + 4c92057 commit 4368c4b

File tree

7 files changed

+246
-40
lines changed

7 files changed

+246
-40
lines changed

Documentation/admin-guide/hw-vuln/spectre.rst

Lines changed: 80 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,11 @@ Related CVEs
4141

4242
The following CVE entries describe Spectre variants:
4343

44-
============= ======================= =================
44+
============= ======================= ==========================
4545
CVE-2017-5753 Bounds check bypass Spectre variant 1
4646
CVE-2017-5715 Branch target injection Spectre variant 2
47-
============= ======================= =================
47+
CVE-2019-1125 Spectre v1 swapgs Spectre variant 1 (swapgs)
48+
============= ======================= ==========================
4849

4950
Problem
5051
-------
@@ -78,6 +79,13 @@ There are some extensions of Spectre variant 1 attacks for reading data
7879
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
7980
are difficult, low bandwidth, fragile, and are considered low risk.
8081

82+
Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
83+
only about user-controlled array bounds checks. It can affect any
84+
conditional checks. The kernel entry code interrupt, exception, and NMI
85+
handlers all have conditional swapgs checks. Those may be problematic
86+
in the context of Spectre v1, as kernel code can speculatively run with
87+
a user GS.
88+
8189
Spectre variant 2 (Branch Target Injection)
8290
-------------------------------------------
8391

@@ -132,6 +140,9 @@ not cover all possible attack vectors.
132140
1. A user process attacking the kernel
133141
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
134142

143+
Spectre variant 1
144+
~~~~~~~~~~~~~~~~~
145+
135146
The attacker passes a parameter to the kernel via a register or
136147
via a known address in memory during a syscall. Such parameter may
137148
be used later by the kernel as an index to an array or to derive
@@ -144,7 +155,40 @@ not cover all possible attack vectors.
144155
potentially be influenced for Spectre attacks, new "nospec" accessor
145156
macros are used to prevent speculative loading of data.
146157

147-
Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
158+
Spectre variant 1 (swapgs)
159+
~~~~~~~~~~~~~~~~~~~~~~~~~~
160+
161+
An attacker can train the branch predictor to speculatively skip the
162+
swapgs path for an interrupt or exception. If they initialize
163+
the GS register to a user-space value, if the swapgs is speculatively
164+
skipped, subsequent GS-related percpu accesses in the speculation
165+
window will be done with the attacker-controlled GS value. This
166+
could cause privileged memory to be accessed and leaked.
167+
168+
For example:
169+
170+
::
171+
172+
if (coming from user space)
173+
swapgs
174+
mov %gs:<percpu_offset>, %reg
175+
mov (%reg), %reg1
176+
177+
When coming from user space, the CPU can speculatively skip the
178+
swapgs, and then do a speculative percpu load using the user GS
179+
value. So the user can speculatively force a read of any kernel
180+
value. If a gadget exists which uses the percpu value as an address
181+
in another load/store, then the contents of the kernel value may
182+
become visible via an L1 side channel attack.
183+
184+
A similar attack exists when coming from kernel space. The CPU can
185+
speculatively do the swapgs, causing the user GS to get used for the
186+
rest of the speculative window.
187+
188+
Spectre variant 2
189+
~~~~~~~~~~~~~~~~~
190+
191+
A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
148192
target buffer (BTB) before issuing syscall to launch an attack.
149193
After entering the kernel, the kernel could use the poisoned branch
150194
target buffer on indirect jump and jump to gadget code in speculative
@@ -280,11 +324,18 @@ The sysfs file showing Spectre variant 1 mitigation status is:
280324

281325
The possible values in this file are:
282326

283-
======================================= =================================
284-
'Mitigation: __user pointer sanitation' Protection in kernel on a case by
285-
case base with explicit pointer
286-
sanitation.
287-
======================================= =================================
327+
.. list-table::
328+
329+
* - 'Not affected'
330+
- The processor is not vulnerable.
331+
* - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
332+
- The swapgs protections are disabled; otherwise it has
333+
protection in the kernel on a case by case base with explicit
334+
pointer sanitation and usercopy LFENCE barriers.
335+
* - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
336+
- Protection in the kernel on a case by case base with explicit
337+
pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
338+
barriers.
288339

289340
However, the protections are put in place on a case by case basis,
290341
and there is no guarantee that all possible attack vectors for Spectre
@@ -366,12 +417,27 @@ Turning on mitigation for Spectre variant 1 and Spectre variant 2
366417
1. Kernel mitigation
367418
^^^^^^^^^^^^^^^^^^^^
368419

420+
Spectre variant 1
421+
~~~~~~~~~~~~~~~~~
422+
369423
For the Spectre variant 1, vulnerable kernel code (as determined
370424
by code audit or scanning tools) is annotated on a case by case
371425
basis to use nospec accessor macros for bounds clipping :ref:`[2]
372426
<spec_ref2>` to avoid any usable disclosure gadgets. However, it may
373427
not cover all attack vectors for Spectre variant 1.
374428

429+
Copy-from-user code has an LFENCE barrier to prevent the access_ok()
430+
check from being mis-speculated. The barrier is done by the
431+
barrier_nospec() macro.
432+
433+
For the swapgs variant of Spectre variant 1, LFENCE barriers are
434+
added to interrupt, exception and NMI entry where needed. These
435+
barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
436+
FENCE_SWAPGS_USER_ENTRY macros.
437+
438+
Spectre variant 2
439+
~~~~~~~~~~~~~~~~~
440+
375441
For Spectre variant 2 mitigation, the compiler turns indirect calls or
376442
jumps in the kernel into equivalent return trampolines (retpolines)
377443
:ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
@@ -473,6 +539,12 @@ Mitigation control on the kernel command line
473539
Spectre variant 2 mitigation can be disabled or force enabled at the
474540
kernel command line.
475541

542+
nospectre_v1
543+
544+
[X86,PPC] Disable mitigations for Spectre Variant 1
545+
(bounds check bypass). With this option data leaks are
546+
possible in the system.
547+
476548
nospectre_v2
477549

478550
[X86] Disable all mitigations for the Spectre variant 2

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2604,7 +2604,7 @@
26042604
expose users to several CPU vulnerabilities.
26052605
Equivalent to: nopti [X86,PPC]
26062606
kpti=0 [ARM64]
2607-
nospectre_v1 [PPC]
2607+
nospectre_v1 [X86,PPC]
26082608
nobp=0 [S390]
26092609
nospectre_v2 [X86,PPC,S390,ARM64]
26102610
spectre_v2_user=off [X86]
@@ -2965,9 +2965,9 @@
29652965
nosmt=force: Force disable SMT, cannot be undone
29662966
via the sysfs control file.
29672967

2968-
nospectre_v1 [PPC] Disable mitigations for Spectre Variant 1 (bounds
2969-
check bypass). With this option data leaks are possible
2970-
in the system.
2968+
nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1
2969+
(bounds check bypass). With this option data leaks are
2970+
possible in the system.
29712971

29722972
nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
29732973
the Spectre variant 2 (indirect branch prediction)

arch/x86/entry/calling.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,6 +314,23 @@ For 32-bit we have the following conventions - kernel is built with
314314

315315
#endif
316316

317+
/*
318+
* Mitigate Spectre v1 for conditional swapgs code paths.
319+
*
320+
* FENCE_SWAPGS_USER_ENTRY is used in the user entry swapgs code path, to
321+
* prevent a speculative swapgs when coming from kernel space.
322+
*
323+
* FENCE_SWAPGS_KERNEL_ENTRY is used in the kernel entry non-swapgs code path,
324+
* to prevent the swapgs from getting speculatively skipped when coming from
325+
* user space.
326+
*/
327+
.macro FENCE_SWAPGS_USER_ENTRY
328+
ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_USER
329+
.endm
330+
.macro FENCE_SWAPGS_KERNEL_ENTRY
331+
ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL
332+
.endm
333+
317334
.macro STACKLEAK_ERASE_NOCLOBBER
318335
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
319336
PUSH_AND_CLEAR_REGS

arch/x86/entry/entry_64.S

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -519,7 +519,7 @@ ENTRY(interrupt_entry)
519519
testb $3, CS-ORIG_RAX+8(%rsp)
520520
jz 1f
521521
SWAPGS
522-
522+
FENCE_SWAPGS_USER_ENTRY
523523
/*
524524
* Switch to the thread stack. The IRET frame and orig_ax are
525525
* on the stack, as well as the return address. RDI..R12 are
@@ -549,8 +549,10 @@ ENTRY(interrupt_entry)
549549
UNWIND_HINT_FUNC
550550

551551
movq (%rdi), %rdi
552+
jmp 2f
552553
1:
553-
554+
FENCE_SWAPGS_KERNEL_ENTRY
555+
2:
554556
PUSH_AND_CLEAR_REGS save_ret=1
555557
ENCODE_FRAME_POINTER 8
556558

@@ -1238,6 +1240,13 @@ ENTRY(paranoid_entry)
12381240
*/
12391241
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
12401242

1243+
/*
1244+
* The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
1245+
* unconditional CR3 write, even in the PTI case. So do an lfence
1246+
* to prevent GS speculation, regardless of whether PTI is enabled.
1247+
*/
1248+
FENCE_SWAPGS_KERNEL_ENTRY
1249+
12411250
ret
12421251
END(paranoid_entry)
12431252

@@ -1288,6 +1297,7 @@ ENTRY(error_entry)
12881297
* from user mode due to an IRET fault.
12891298
*/
12901299
SWAPGS
1300+
FENCE_SWAPGS_USER_ENTRY
12911301
/* We have user CR3. Change to kernel CR3. */
12921302
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
12931303

@@ -1301,6 +1311,8 @@ ENTRY(error_entry)
13011311
pushq %r12
13021312
ret
13031313

1314+
.Lerror_entry_done_lfence:
1315+
FENCE_SWAPGS_KERNEL_ENTRY
13041316
.Lerror_entry_done:
13051317
ret
13061318

@@ -1318,14 +1330,15 @@ ENTRY(error_entry)
13181330
cmpq %rax, RIP+8(%rsp)
13191331
je .Lbstep_iret
13201332
cmpq $.Lgs_change, RIP+8(%rsp)
1321-
jne .Lerror_entry_done
1333+
jne .Lerror_entry_done_lfence
13221334

13231335
/*
13241336
* hack: .Lgs_change can fail with user gsbase. If this happens, fix up
13251337
* gsbase and proceed. We'll fix up the exception and land in
13261338
* .Lgs_change's error handler with kernel gsbase.
13271339
*/
13281340
SWAPGS
1341+
FENCE_SWAPGS_USER_ENTRY
13291342
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
13301343
jmp .Lerror_entry_done
13311344

@@ -1340,6 +1353,7 @@ ENTRY(error_entry)
13401353
* gsbase and CR3. Switch to kernel gsbase and CR3:
13411354
*/
13421355
SWAPGS
1356+
FENCE_SWAPGS_USER_ENTRY
13431357
SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
13441358

13451359
/*
@@ -1431,6 +1445,7 @@ ENTRY(nmi)
14311445

14321446
swapgs
14331447
cld
1448+
FENCE_SWAPGS_USER_ENTRY
14341449
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
14351450
movq %rsp, %rdx
14361451
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

arch/x86/include/asm/cpufeatures.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,8 @@
281281
#define X86_FEATURE_CQM_OCCUP_LLC (11*32+ 1) /* LLC occupancy monitoring */
282282
#define X86_FEATURE_CQM_MBM_TOTAL (11*32+ 2) /* LLC Total MBM monitoring */
283283
#define X86_FEATURE_CQM_MBM_LOCAL (11*32+ 3) /* LLC Local MBM monitoring */
284+
#define X86_FEATURE_FENCE_SWAPGS_USER (11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
285+
#define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
284286

285287
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
286288
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -394,5 +396,6 @@
394396
#define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
395397
#define X86_BUG_MDS X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
396398
#define X86_BUG_MSBDS_ONLY X86_BUG(20) /* CPU is only affected by the MSDBS variant of BUG_MDS */
399+
#define X86_BUG_SWAPGS X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
397400

398401
#endif /* _ASM_X86_CPUFEATURES_H */

0 commit comments

Comments
 (0)