Skip to content

Commit 37d1a04

Browse files
committed
Rebase locking/kcsan to locking/urgent
Merge the state of the locking kcsan branch before the read/write_once() and the atomics modifications got merged. Squash the fallout of the rebase on top of the read/write once and atomic fallback work into the merge. The history of the original branch is preserved in tag locking-kcsan-2020-06-02. Signed-off-by: Thomas Gleixner <[email protected]>
2 parents 37f8173 + 97a9474 commit 37d1a04

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+4164
-582
lines changed

Documentation/dev-tools/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ whole; patches welcome!
2121
kasan
2222
ubsan
2323
kmemleak
24+
kcsan
2425
gdb-kernel-debugging
2526
kgdb
2627
kselftest

Documentation/dev-tools/kcsan.rst

Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
The Kernel Concurrency Sanitizer (KCSAN)
2+
========================================
3+
4+
The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which
5+
relies on compile-time instrumentation, and uses a watchpoint-based sampling
6+
approach to detect races. KCSAN's primary purpose is to detect `data races`_.
7+
8+
Usage
9+
-----
10+
11+
KCSAN is supported in both GCC and Clang. With GCC it requires version 7.3.0 or
12+
later. With Clang it requires version 7.0.0 or later.
13+
14+
To enable KCSAN configure the kernel with::
15+
16+
CONFIG_KCSAN = y
17+
18+
KCSAN provides several other configuration options to customize behaviour (see
19+
the respective help text in ``lib/Kconfig.kcsan`` for more info).
20+
21+
Error reports
22+
~~~~~~~~~~~~~
23+
24+
A typical data race report looks like this::
25+
26+
==================================================================
27+
BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
28+
29+
write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
30+
kernfs_refresh_inode+0x70/0x170
31+
kernfs_iop_permission+0x4f/0x90
32+
inode_permission+0x190/0x200
33+
link_path_walk.part.0+0x503/0x8e0
34+
path_lookupat.isra.0+0x69/0x4d0
35+
filename_lookup+0x136/0x280
36+
user_path_at_empty+0x47/0x60
37+
vfs_statx+0x9b/0x130
38+
__do_sys_newlstat+0x50/0xb0
39+
__x64_sys_newlstat+0x37/0x50
40+
do_syscall_64+0x85/0x260
41+
entry_SYSCALL_64_after_hwframe+0x44/0xa9
42+
43+
read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
44+
generic_permission+0x5b/0x2a0
45+
kernfs_iop_permission+0x66/0x90
46+
inode_permission+0x190/0x200
47+
link_path_walk.part.0+0x503/0x8e0
48+
path_lookupat.isra.0+0x69/0x4d0
49+
filename_lookup+0x136/0x280
50+
user_path_at_empty+0x47/0x60
51+
do_faccessat+0x11a/0x390
52+
__x64_sys_access+0x3c/0x50
53+
do_syscall_64+0x85/0x260
54+
entry_SYSCALL_64_after_hwframe+0x44/0xa9
55+
56+
Reported by Kernel Concurrency Sanitizer on:
57+
CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
58+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
59+
==================================================================
60+
61+
The header of the report provides a short summary of the functions involved in
62+
the race. It is followed by the access types and stack traces of the 2 threads
63+
involved in the data race.
64+
65+
The other less common type of data race report looks like this::
66+
67+
==================================================================
68+
BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
69+
70+
race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
71+
e1000_clean_rx_irq+0x551/0xb10
72+
e1000_clean+0x533/0xda0
73+
net_rx_action+0x329/0x900
74+
__do_softirq+0xdb/0x2db
75+
irq_exit+0x9b/0xa0
76+
do_IRQ+0x9c/0xf0
77+
ret_from_intr+0x0/0x18
78+
default_idle+0x3f/0x220
79+
arch_cpu_idle+0x21/0x30
80+
do_idle+0x1df/0x230
81+
cpu_startup_entry+0x14/0x20
82+
rest_init+0xc5/0xcb
83+
arch_call_rest_init+0x13/0x2b
84+
start_kernel+0x6db/0x700
85+
86+
Reported by Kernel Concurrency Sanitizer on:
87+
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
88+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
89+
==================================================================
90+
91+
This report is generated where it was not possible to determine the other
92+
racing thread, but a race was inferred due to the data value of the watched
93+
memory location having changed. These can occur either due to missing
94+
instrumentation or e.g. DMA accesses. These reports will only be generated if
95+
``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` (selected by default).
96+
97+
Selective analysis
98+
~~~~~~~~~~~~~~~~~~
99+
100+
It may be desirable to disable data race detection for specific accesses,
101+
functions, compilation units, or entire subsystems. For static blacklisting,
102+
the below options are available:
103+
104+
* KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
105+
any data races due to accesses in ``expr`` should be ignored and resulting
106+
behaviour when encountering a data race is deemed safe.
107+
108+
* Disabling data race detection for entire functions can be accomplished by
109+
using the function attribute ``__no_kcsan``::
110+
111+
__no_kcsan
112+
void foo(void) {
113+
...
114+
115+
To dynamically limit for which functions to generate reports, see the
116+
`DebugFS interface`_ blacklist/whitelist feature.
117+
118+
For ``__always_inline`` functions, replace ``__always_inline`` with
119+
``__no_kcsan_or_inline`` (which implies ``__always_inline``)::
120+
121+
static __no_kcsan_or_inline void foo(void) {
122+
...
123+
124+
Note: Older compiler versions (GCC < 9) also do not always honor the
125+
``__no_kcsan`` attribute on regular ``inline`` functions. If false positives
126+
with these compilers cannot be tolerated, for small functions where
127+
``__always_inline`` would be appropriate, ``__no_kcsan_or_inline`` should be
128+
preferred instead.
129+
130+
* To disable data race detection for a particular compilation unit, add to the
131+
``Makefile``::
132+
133+
KCSAN_SANITIZE_file.o := n
134+
135+
* To disable data race detection for all compilation units listed in a
136+
``Makefile``, add to the respective ``Makefile``::
137+
138+
KCSAN_SANITIZE := n
139+
140+
Furthermore, it is possible to tell KCSAN to show or hide entire classes of
141+
data races, depending on preferences. These can be changed via the following
142+
Kconfig options:
143+
144+
* ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write
145+
is observed via a watchpoint, but the data value of the memory location was
146+
observed to remain unchanged, do not report the data race.
147+
148+
* ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes
149+
up to word size are atomic by default. Assumes that such writes are not
150+
subject to unsafe compiler optimizations resulting in data races. The option
151+
causes KCSAN to not report data races due to conflicts where the only plain
152+
accesses are aligned writes up to word size.
153+
154+
DebugFS interface
155+
~~~~~~~~~~~~~~~~~
156+
157+
The file ``/sys/kernel/debug/kcsan`` provides the following interface:
158+
159+
* Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics.
160+
161+
* Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN
162+
on or off, respectively.
163+
164+
* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
165+
``some_func_name`` to the report filter list, which (by default) blacklists
166+
reporting data races where either one of the top stackframes are a function
167+
in the list.
168+
169+
* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
170+
changes the report filtering behaviour. For example, the blacklist feature
171+
can be used to silence frequently occurring data races; the whitelist feature
172+
can help with reproduction and testing of fixes.
173+
174+
Tuning performance
175+
~~~~~~~~~~~~~~~~~~
176+
177+
Core parameters that affect KCSAN's overall performance and bug detection
178+
ability are exposed as kernel command-line arguments whose defaults can also be
179+
changed via the corresponding Kconfig options.
180+
181+
* ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory
182+
operations to skip, before another watchpoint is set up. Setting up
183+
watchpoints more frequently will result in the likelihood of races to be
184+
observed to increase. This parameter has the most significant impact on
185+
overall system performance and race detection ability.
186+
187+
* ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the
188+
microsecond delay to stall execution after a watchpoint has been set up.
189+
Larger values result in the window in which we may observe a race to
190+
increase.
191+
192+
* ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For
193+
interrupts, the microsecond delay to stall execution after a watchpoint has
194+
been set up. Interrupts have tighter latency requirements, and their delay
195+
should generally be smaller than the one chosen for tasks.
196+
197+
They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``.
198+
199+
Data Races
200+
----------
201+
202+
In an execution, two memory accesses form a *data race* if they *conflict*,
203+
they happen concurrently in different threads, and at least one of them is a
204+
*plain access*; they *conflict* if both access the same memory location, and at
205+
least one is a write. For a more thorough discussion and definition, see `"Plain
206+
Accesses and Data Races" in the LKMM`_.
207+
208+
.. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922
209+
210+
Relationship with the Linux-Kernel Memory Consistency Model (LKMM)
211+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212+
213+
The LKMM defines the propagation and ordering rules of various memory
214+
operations, which gives developers the ability to reason about concurrent code.
215+
Ultimately this allows to determine the possible executions of concurrent code,
216+
and if that code is free from data races.
217+
218+
KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``,
219+
``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply
220+
assumes that memory barriers are placed correctly. In other words, KCSAN
221+
assumes that as long as a plain access is not observed to race with another
222+
conflicting access, memory operations are correctly ordered.
223+
224+
This means that KCSAN will not report *potential* data races due to missing
225+
memory ordering. Developers should therefore carefully consider the required
226+
memory ordering requirements that remain unchecked. If, however, missing
227+
memory ordering (that is observable with a particular compiler and
228+
architecture) leads to an observable data race (e.g. entering a critical
229+
section erroneously), KCSAN would report the resulting data race.
230+
231+
Race Detection Beyond Data Races
232+
--------------------------------
233+
234+
For code with complex concurrency design, race-condition bugs may not always
235+
manifest as data races. Race conditions occur if concurrently executing
236+
operations result in unexpected system behaviour. On the other hand, data races
237+
are defined at the C-language level. The following macros can be used to check
238+
properties of concurrent code where bugs would not manifest as data races.
239+
240+
.. kernel-doc:: include/linux/kcsan-checks.h
241+
:functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED
242+
ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED
243+
ASSERT_EXCLUSIVE_BITS
244+
245+
Implementation Details
246+
----------------------
247+
248+
KCSAN relies on observing that two accesses happen concurrently. Crucially, we
249+
want to (a) increase the chances of observing races (especially for races that
250+
manifest rarely), and (b) be able to actually observe them. We can accomplish
251+
(a) by injecting various delays, and (b) by using address watchpoints (or
252+
breakpoints).
253+
254+
If we deliberately stall a memory access, while we have a watchpoint for its
255+
address set up, and then observe the watchpoint to fire, two accesses to the
256+
same address just raced. Using hardware watchpoints, this is the approach taken
257+
in `DataCollider
258+
<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
259+
Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
260+
relies on compiler instrumentation and "soft watchpoints".
261+
262+
In KCSAN, watchpoints are implemented using an efficient encoding that stores
263+
access type, size, and address in a long; the benefits of using "soft
264+
watchpoints" are portability and greater flexibility. KCSAN then relies on the
265+
compiler instrumenting plain accesses. For each instrumented plain access:
266+
267+
1. Check if a matching watchpoint exists; if yes, and at least one access is a
268+
write, then we encountered a racing access.
269+
270+
2. Periodically, if no matching watchpoint exists, set up a watchpoint and
271+
stall for a small randomized delay.
272+
273+
3. Also check the data value before the delay, and re-check the data value
274+
after delay; if the values mismatch, we infer a race of unknown origin.
275+
276+
To detect data races between plain and marked accesses, KCSAN also annotates
277+
marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never
278+
sets up a watchpoint on marked accesses. By never setting up watchpoints for
279+
marked operations, if all accesses to a variable that is accessed concurrently
280+
are properly marked, KCSAN will never trigger a watchpoint and therefore never
281+
report the accesses.
282+
283+
Key Properties
284+
~~~~~~~~~~~~~~
285+
286+
1. **Memory Overhead:** The overall memory overhead is only a few MiB
287+
depending on configuration. The current implementation uses a small array of
288+
longs to encode watchpoint information, which is negligible.
289+
290+
2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
291+
efficient watchpoint encoding that does not require acquiring any shared
292+
locks in the fast-path. For kernel boot on a system with 8 CPUs:
293+
294+
- 5.0x slow-down with the default KCSAN config;
295+
- 2.8x slow-down from runtime fast-path overhead only (set very large
296+
``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``).
297+
298+
3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN
299+
runtime. As a result, maintenance overheads are minimal as the kernel
300+
evolves.
301+
302+
4. **Detects Racy Writes from Devices:** Due to checking data values upon
303+
setting up watchpoints, racy writes from devices can also be detected.
304+
305+
5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering
306+
rules; this may result in missed data races (false negatives).
307+
308+
6. **Analysis Accuracy:** For observed executions, due to using a sampling
309+
strategy, the analysis is *unsound* (false negatives possible), but aims to
310+
be complete (no false positives).
311+
312+
Alternatives Considered
313+
-----------------------
314+
315+
An alternative data race detection approach for the kernel can be found in the
316+
`Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_.
317+
KTSAN is a happens-before data race detector, which explicitly establishes the
318+
happens-before order between memory operations, which can then be used to
319+
determine data races as defined in `Data Races`_.
320+
321+
To build a correct happens-before relation, KTSAN must be aware of all ordering
322+
rules of the LKMM and synchronization primitives. Unfortunately, any omission
323+
leads to large numbers of false positives, which is especially detrimental in
324+
the context of the kernel which includes numerous custom synchronization
325+
mechanisms. To track the happens-before relation, KTSAN's implementation
326+
requires metadata for each memory location (shadow memory), which for each page
327+
corresponds to 4 pages of shadow memory, and can translate into overhead of
328+
tens of GiB on a large system.

MAINTAINERS

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9305,6 +9305,17 @@ F: Documentation/kbuild/kconfig*
93059305
F: scripts/Kconfig.include
93069306
F: scripts/kconfig/
93079307

9308+
KCSAN
9309+
M: Marco Elver <[email protected]>
9310+
R: Dmitry Vyukov <[email protected]>
9311+
9312+
S: Maintained
9313+
F: Documentation/dev-tools/kcsan.rst
9314+
F: include/linux/kcsan*.h
9315+
F: kernel/kcsan/
9316+
F: lib/Kconfig.kcsan
9317+
F: scripts/Makefile.kcsan
9318+
93089319
KDUMP
93099320
M: Dave Young <[email protected]>
93109321
M: Baoquan He <[email protected]>

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
531531

532532
export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
533533
export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
534-
export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
534+
export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
535535
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
536536
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
537537
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
@@ -965,6 +965,7 @@ endif
965965
include scripts/Makefile.kasan
966966
include scripts/Makefile.extrawarn
967967
include scripts/Makefile.ubsan
968+
include scripts/Makefile.kcsan
968969

969970
# Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
970971
KBUILD_CPPFLAGS += $(KCPPFLAGS)

arch/x86/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,7 @@ config X86
233233
select THREAD_INFO_IN_TASK
234234
select USER_STACKTRACE_SUPPORT
235235
select VIRT_TO_BUS
236+
select HAVE_ARCH_KCSAN if X86_64
236237
select X86_FEATURE_NAMES if PROC_FS
237238
select PROC_PID_ARCH_STATUS if PROC_FS
238239
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI

arch/x86/boot/Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@
99
# Changed by many, many contributors over the years.
1010
#
1111

12+
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
1213
KASAN_SANITIZE := n
14+
KCSAN_SANITIZE := n
1315
OBJECT_FILES_NON_STANDARD := y
1416

1517
# Kernel does not boot with kcov instrumentation here.

arch/x86/boot/compressed/Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@
1717
# (see scripts/Makefile.lib size_append)
1818
# compressed vmlinux.bin.all + u32 size of vmlinux.bin.all
1919

20+
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
2021
KASAN_SANITIZE := n
22+
KCSAN_SANITIZE := n
2123
OBJECT_FILES_NON_STANDARD := y
2224

2325
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.

0 commit comments

Comments
 (0)