Skip to content

Commit 1eb2b78

Browse files
rpedgecohansendc
authored andcommitted
Documentation/x86: Add CET shadow stack description
Introduce a new document on Control-flow Enforcement Technology (CET). Co-developed-by: Yu-cheng Yu <[email protected]> Signed-off-by: Yu-cheng Yu <[email protected]> Signed-off-by: Rick Edgecombe <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Tested-by: Pengfei Xu <[email protected]> Tested-by: John Allen <[email protected]> Tested-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/all/20230613001108.3040476-24-rick.p.edgecombe%40intel.com
1 parent 6beb995 commit 1eb2b78

File tree

2 files changed

+170
-0
lines changed

2 files changed

+170
-0
lines changed

Documentation/arch/x86/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ x86-specific Documentation
2222
mtrr
2323
pat
2424
intel-hfi
25+
shstk
2526
iommu
2627
intel_txt
2728
amd-memory-encryption

Documentation/arch/x86/shstk.rst

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
======================================================
4+
Control-flow Enforcement Technology (CET) Shadow Stack
5+
======================================================
6+
7+
CET Background
8+
==============
9+
10+
Control-flow Enforcement Technology (CET) covers several related x86 processor
11+
features that provide protection against control flow hijacking attacks. CET
12+
can protect both applications and the kernel.
13+
14+
CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack
15+
is a secondary stack allocated from memory which cannot be directly modified by
16+
applications. When executing a CALL instruction, the processor pushes the
17+
return address to both the normal stack and the shadow stack. Upon
18+
function return, the processor pops the shadow stack copy and compares it
19+
to the normal stack copy. If the two differ, the processor raises a
20+
control-protection fault. IBT verifies indirect CALL/JMP targets are intended
21+
as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow
22+
Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace
23+
shadow stack and kernel IBT are supported.
24+
25+
Requirements to use Shadow Stack
26+
================================
27+
28+
To use userspace shadow stack you need HW that supports it, a kernel
29+
configured with it and userspace libraries compiled with it.
30+
31+
The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow
32+
stacks can be disabled at runtime with the kernel parameter: nousershstk.
33+
34+
To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later
35+
are required.
36+
37+
At run time, /proc/cpuinfo shows CET features if the processor supports
38+
CET. "user_shstk" means that userspace shadow stack is supported on the current
39+
kernel and HW.
40+
41+
Application Enabling
42+
====================
43+
44+
An application's CET capability is marked in its ELF note and can be verified
45+
from readelf/llvm-readelf output::
46+
47+
readelf -n <application> | grep -a SHSTK
48+
properties: x86 feature: SHSTK
49+
50+
The kernel does not process these applications markers directly. Applications
51+
or loaders must enable CET features using the interface described in section 4.
52+
Typically this would be done in dynamic loader or static runtime objects, as is
53+
the case in GLIBC.
54+
55+
Enabling arch_prctl()'s
56+
=======================
57+
58+
Elf features should be enabled by the loader using the below arch_prctl's. They
59+
are only supported in 64 bit user applications. These operate on the features
60+
on a per-thread basis. The enablement status is inherited on clone, so if the
61+
feature is enabled on the first thread, it will propagate to all the thread's
62+
in an app.
63+
64+
arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature)
65+
Enable a single feature specified in 'feature'. Can only operate on
66+
one feature at a time.
67+
68+
arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature)
69+
Disable a single feature specified in 'feature'. Can only operate on
70+
one feature at a time.
71+
72+
arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
73+
Lock in features at their current enabled or disabled status. 'features'
74+
is a mask of all features to lock. All bits set are processed, unset bits
75+
are ignored. The mask is ORed with the existing value. So any feature bits
76+
set here cannot be enabled or disabled afterwards.
77+
78+
The return values are as follows. On success, return 0. On error, errno can
79+
be::
80+
81+
-EPERM if any of the passed feature are locked.
82+
-ENOTSUPP if the feature is not supported by the hardware or
83+
kernel.
84+
-EINVAL arguments (non existing feature, etc)
85+
86+
The feature's bits supported are::
87+
88+
ARCH_SHSTK_SHSTK - Shadow stack
89+
ARCH_SHSTK_WRSS - WRSS
90+
91+
Currently shadow stack and WRSS are supported via this interface. WRSS
92+
can only be enabled with shadow stack, and is automatically disabled
93+
if shadow stack is disabled.
94+
95+
Proc Status
96+
===========
97+
To check if an application is actually running with shadow stack, the
98+
user can read the /proc/$PID/status. It will report "wrss" or "shstk"
99+
depending on what is enabled. The lines look like this::
100+
101+
x86_Thread_features: shstk wrss
102+
x86_Thread_features_locked: shstk wrss
103+
104+
Implementation of the Shadow Stack
105+
==================================
106+
107+
Shadow Stack Size
108+
-----------------
109+
110+
A task's shadow stack is allocated from memory to a fixed size of
111+
MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
112+
the maximum size of the normal stack, but capped to 4 GB. In the case
113+
of the clone3 syscall, there is a stack size passed in and shadow stack
114+
uses this instead of the rlimit.
115+
116+
Signal
117+
------
118+
119+
The main program and its signal handlers use the same shadow stack. Because
120+
the shadow stack stores only return addresses, a large shadow stack covers
121+
the condition that both the program stack and the signal alternate stack run
122+
out.
123+
124+
When a signal happens, the old pre-signal state is pushed on the stack. When
125+
shadow stack is enabled, the shadow stack specific state is pushed onto the
126+
shadow stack. Today this is only the old SSP (shadow stack pointer), pushed
127+
in a special format with bit 63 set. On sigreturn this old SSP token is
128+
verified and restored by the kernel. The kernel will also push the normal
129+
restorer address to the shadow stack to help userspace avoid a shadow stack
130+
violation on the sigreturn path that goes through the restorer.
131+
132+
So the shadow stack signal frame format is as follows::
133+
134+
|1...old SSP| - Pointer to old pre-signal ssp in sigframe token format
135+
(bit 63 set to 1)
136+
| ...| - Other state may be added in the future
137+
138+
139+
32 bit ABI signals are not supported in shadow stack processes. Linux prevents
140+
32 bit execution while shadow stack is enabled by the allocating shadow stacks
141+
outside of the 32 bit address space. When execution enters 32 bit mode, either
142+
via far call or returning to userspace, a #GP is generated by the hardware
143+
which, will be delivered to the process as a segfault. When transitioning to
144+
userspace the register's state will be as if the userspace ip being returned to
145+
caused the segfault.
146+
147+
Fork
148+
----
149+
150+
The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
151+
to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
152+
shadow access triggers a page fault with the shadow stack access bit set
153+
in the page fault error code.
154+
155+
When a task forks a child, its shadow stack PTEs are copied and both the
156+
parent's and the child's shadow stack PTEs are cleared of the dirty bit.
157+
Upon the next shadow stack access, the resulting shadow stack page fault
158+
is handled by page copy/re-use.
159+
160+
When a pthread child is created, the kernel allocates a new shadow stack
161+
for the new thread. New shadow stack creation behaves like mmap() with respect
162+
to ASLR behavior. Similarly, on thread exit the thread's shadow stack is
163+
disabled.
164+
165+
Exec
166+
----
167+
168+
On exec, shadow stack features are disabled by the kernel. At which point,
169+
userspace can choose to re-enable, or lock them.

0 commit comments

Comments
 (0)