|
| 1 | +The Kernel Concurrency Sanitizer (KCSAN) |
| 2 | +======================================== |
| 3 | + |
| 4 | +The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which |
| 5 | +relies on compile-time instrumentation, and uses a watchpoint-based sampling |
| 6 | +approach to detect races. KCSAN's primary purpose is to detect `data races`_. |
| 7 | + |
| 8 | +Usage |
| 9 | +----- |
| 10 | + |
| 11 | +KCSAN requires Clang version 11 or later. |
| 12 | + |
| 13 | +To enable KCSAN configure the kernel with:: |
| 14 | + |
| 15 | + CONFIG_KCSAN = y |
| 16 | + |
| 17 | +KCSAN provides several other configuration options to customize behaviour (see |
| 18 | +the respective help text in ``lib/Kconfig.kcsan`` for more info). |
| 19 | + |
| 20 | +Error reports |
| 21 | +~~~~~~~~~~~~~ |
| 22 | + |
| 23 | +A typical data race report looks like this:: |
| 24 | + |
| 25 | + ================================================================== |
| 26 | + BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode |
| 27 | + |
| 28 | + write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4: |
| 29 | + kernfs_refresh_inode+0x70/0x170 |
| 30 | + kernfs_iop_permission+0x4f/0x90 |
| 31 | + inode_permission+0x190/0x200 |
| 32 | + link_path_walk.part.0+0x503/0x8e0 |
| 33 | + path_lookupat.isra.0+0x69/0x4d0 |
| 34 | + filename_lookup+0x136/0x280 |
| 35 | + user_path_at_empty+0x47/0x60 |
| 36 | + vfs_statx+0x9b/0x130 |
| 37 | + __do_sys_newlstat+0x50/0xb0 |
| 38 | + __x64_sys_newlstat+0x37/0x50 |
| 39 | + do_syscall_64+0x85/0x260 |
| 40 | + entry_SYSCALL_64_after_hwframe+0x44/0xa9 |
| 41 | + |
| 42 | + read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6: |
| 43 | + generic_permission+0x5b/0x2a0 |
| 44 | + kernfs_iop_permission+0x66/0x90 |
| 45 | + inode_permission+0x190/0x200 |
| 46 | + link_path_walk.part.0+0x503/0x8e0 |
| 47 | + path_lookupat.isra.0+0x69/0x4d0 |
| 48 | + filename_lookup+0x136/0x280 |
| 49 | + user_path_at_empty+0x47/0x60 |
| 50 | + do_faccessat+0x11a/0x390 |
| 51 | + __x64_sys_access+0x3c/0x50 |
| 52 | + do_syscall_64+0x85/0x260 |
| 53 | + entry_SYSCALL_64_after_hwframe+0x44/0xa9 |
| 54 | + |
| 55 | + Reported by Kernel Concurrency Sanitizer on: |
| 56 | + CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1 |
| 57 | + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 |
| 58 | + ================================================================== |
| 59 | + |
| 60 | +The header of the report provides a short summary of the functions involved in |
| 61 | +the race. It is followed by the access types and stack traces of the 2 threads |
| 62 | +involved in the data race. |
| 63 | + |
| 64 | +The other less common type of data race report looks like this:: |
| 65 | + |
| 66 | + ================================================================== |
| 67 | + BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10 |
| 68 | + |
| 69 | + race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0: |
| 70 | + e1000_clean_rx_irq+0x551/0xb10 |
| 71 | + e1000_clean+0x533/0xda0 |
| 72 | + net_rx_action+0x329/0x900 |
| 73 | + __do_softirq+0xdb/0x2db |
| 74 | + irq_exit+0x9b/0xa0 |
| 75 | + do_IRQ+0x9c/0xf0 |
| 76 | + ret_from_intr+0x0/0x18 |
| 77 | + default_idle+0x3f/0x220 |
| 78 | + arch_cpu_idle+0x21/0x30 |
| 79 | + do_idle+0x1df/0x230 |
| 80 | + cpu_startup_entry+0x14/0x20 |
| 81 | + rest_init+0xc5/0xcb |
| 82 | + arch_call_rest_init+0x13/0x2b |
| 83 | + start_kernel+0x6db/0x700 |
| 84 | + |
| 85 | + Reported by Kernel Concurrency Sanitizer on: |
| 86 | + CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2 |
| 87 | + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 |
| 88 | + ================================================================== |
| 89 | + |
| 90 | +This report is generated where it was not possible to determine the other |
| 91 | +racing thread, but a race was inferred due to the data value of the watched |
| 92 | +memory location having changed. These can occur either due to missing |
| 93 | +instrumentation or e.g. DMA accesses. These reports will only be generated if |
| 94 | +``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` (selected by default). |
| 95 | + |
| 96 | +Selective analysis |
| 97 | +~~~~~~~~~~~~~~~~~~ |
| 98 | + |
| 99 | +It may be desirable to disable data race detection for specific accesses, |
| 100 | +functions, compilation units, or entire subsystems. For static blacklisting, |
| 101 | +the below options are available: |
| 102 | + |
| 103 | +* KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that |
| 104 | + any data races due to accesses in ``expr`` should be ignored and resulting |
| 105 | + behaviour when encountering a data race is deemed safe. |
| 106 | + |
| 107 | +* Disabling data race detection for entire functions can be accomplished by |
| 108 | + using the function attribute ``__no_kcsan``:: |
| 109 | + |
| 110 | + __no_kcsan |
| 111 | + void foo(void) { |
| 112 | + ... |
| 113 | + |
| 114 | + To dynamically limit for which functions to generate reports, see the |
| 115 | + `DebugFS interface`_ blacklist/whitelist feature. |
| 116 | + |
| 117 | + For ``__always_inline`` functions, replace ``__always_inline`` with |
| 118 | + ``__no_kcsan_or_inline`` (which implies ``__always_inline``):: |
| 119 | + |
| 120 | + static __no_kcsan_or_inline void foo(void) { |
| 121 | + ... |
| 122 | + |
| 123 | +* To disable data race detection for a particular compilation unit, add to the |
| 124 | + ``Makefile``:: |
| 125 | + |
| 126 | + KCSAN_SANITIZE_file.o := n |
| 127 | + |
| 128 | +* To disable data race detection for all compilation units listed in a |
| 129 | + ``Makefile``, add to the respective ``Makefile``:: |
| 130 | + |
| 131 | + KCSAN_SANITIZE := n |
| 132 | + |
| 133 | +Furthermore, it is possible to tell KCSAN to show or hide entire classes of |
| 134 | +data races, depending on preferences. These can be changed via the following |
| 135 | +Kconfig options: |
| 136 | + |
| 137 | +* ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write |
| 138 | + is observed via a watchpoint, but the data value of the memory location was |
| 139 | + observed to remain unchanged, do not report the data race. |
| 140 | + |
| 141 | +* ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes |
| 142 | + up to word size are atomic by default. Assumes that such writes are not |
| 143 | + subject to unsafe compiler optimizations resulting in data races. The option |
| 144 | + causes KCSAN to not report data races due to conflicts where the only plain |
| 145 | + accesses are aligned writes up to word size. |
| 146 | + |
| 147 | +DebugFS interface |
| 148 | +~~~~~~~~~~~~~~~~~ |
| 149 | + |
| 150 | +The file ``/sys/kernel/debug/kcsan`` provides the following interface: |
| 151 | + |
| 152 | +* Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics. |
| 153 | + |
| 154 | +* Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN |
| 155 | + on or off, respectively. |
| 156 | + |
| 157 | +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds |
| 158 | + ``some_func_name`` to the report filter list, which (by default) blacklists |
| 159 | + reporting data races where either one of the top stackframes are a function |
| 160 | + in the list. |
| 161 | + |
| 162 | +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan`` |
| 163 | + changes the report filtering behaviour. For example, the blacklist feature |
| 164 | + can be used to silence frequently occurring data races; the whitelist feature |
| 165 | + can help with reproduction and testing of fixes. |
| 166 | + |
| 167 | +Tuning performance |
| 168 | +~~~~~~~~~~~~~~~~~~ |
| 169 | + |
| 170 | +Core parameters that affect KCSAN's overall performance and bug detection |
| 171 | +ability are exposed as kernel command-line arguments whose defaults can also be |
| 172 | +changed via the corresponding Kconfig options. |
| 173 | + |
| 174 | +* ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory |
| 175 | + operations to skip, before another watchpoint is set up. Setting up |
| 176 | + watchpoints more frequently will result in the likelihood of races to be |
| 177 | + observed to increase. This parameter has the most significant impact on |
| 178 | + overall system performance and race detection ability. |
| 179 | + |
| 180 | +* ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the |
| 181 | + microsecond delay to stall execution after a watchpoint has been set up. |
| 182 | + Larger values result in the window in which we may observe a race to |
| 183 | + increase. |
| 184 | + |
| 185 | +* ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For |
| 186 | + interrupts, the microsecond delay to stall execution after a watchpoint has |
| 187 | + been set up. Interrupts have tighter latency requirements, and their delay |
| 188 | + should generally be smaller than the one chosen for tasks. |
| 189 | + |
| 190 | +They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``. |
| 191 | + |
| 192 | +Data Races |
| 193 | +---------- |
| 194 | + |
| 195 | +In an execution, two memory accesses form a *data race* if they *conflict*, |
| 196 | +they happen concurrently in different threads, and at least one of them is a |
| 197 | +*plain access*; they *conflict* if both access the same memory location, and at |
| 198 | +least one is a write. For a more thorough discussion and definition, see `"Plain |
| 199 | +Accesses and Data Races" in the LKMM`_. |
| 200 | + |
| 201 | +.. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922 |
| 202 | + |
| 203 | +Relationship with the Linux-Kernel Memory Consistency Model (LKMM) |
| 204 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 205 | + |
| 206 | +The LKMM defines the propagation and ordering rules of various memory |
| 207 | +operations, which gives developers the ability to reason about concurrent code. |
| 208 | +Ultimately this allows to determine the possible executions of concurrent code, |
| 209 | +and if that code is free from data races. |
| 210 | + |
| 211 | +KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``, |
| 212 | +``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply |
| 213 | +assumes that memory barriers are placed correctly. In other words, KCSAN |
| 214 | +assumes that as long as a plain access is not observed to race with another |
| 215 | +conflicting access, memory operations are correctly ordered. |
| 216 | + |
| 217 | +This means that KCSAN will not report *potential* data races due to missing |
| 218 | +memory ordering. Developers should therefore carefully consider the required |
| 219 | +memory ordering requirements that remain unchecked. If, however, missing |
| 220 | +memory ordering (that is observable with a particular compiler and |
| 221 | +architecture) leads to an observable data race (e.g. entering a critical |
| 222 | +section erroneously), KCSAN would report the resulting data race. |
| 223 | + |
| 224 | +Race Detection Beyond Data Races |
| 225 | +-------------------------------- |
| 226 | + |
| 227 | +For code with complex concurrency design, race-condition bugs may not always |
| 228 | +manifest as data races. Race conditions occur if concurrently executing |
| 229 | +operations result in unexpected system behaviour. On the other hand, data races |
| 230 | +are defined at the C-language level. The following macros can be used to check |
| 231 | +properties of concurrent code where bugs would not manifest as data races. |
| 232 | + |
| 233 | +.. kernel-doc:: include/linux/kcsan-checks.h |
| 234 | + :functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED |
| 235 | + ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED |
| 236 | + ASSERT_EXCLUSIVE_BITS |
| 237 | + |
| 238 | +Implementation Details |
| 239 | +---------------------- |
| 240 | + |
| 241 | +KCSAN relies on observing that two accesses happen concurrently. Crucially, we |
| 242 | +want to (a) increase the chances of observing races (especially for races that |
| 243 | +manifest rarely), and (b) be able to actually observe them. We can accomplish |
| 244 | +(a) by injecting various delays, and (b) by using address watchpoints (or |
| 245 | +breakpoints). |
| 246 | + |
| 247 | +If we deliberately stall a memory access, while we have a watchpoint for its |
| 248 | +address set up, and then observe the watchpoint to fire, two accesses to the |
| 249 | +same address just raced. Using hardware watchpoints, this is the approach taken |
| 250 | +in `DataCollider |
| 251 | +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_. |
| 252 | +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead |
| 253 | +relies on compiler instrumentation and "soft watchpoints". |
| 254 | + |
| 255 | +In KCSAN, watchpoints are implemented using an efficient encoding that stores |
| 256 | +access type, size, and address in a long; the benefits of using "soft |
| 257 | +watchpoints" are portability and greater flexibility. KCSAN then relies on the |
| 258 | +compiler instrumenting plain accesses. For each instrumented plain access: |
| 259 | + |
| 260 | +1. Check if a matching watchpoint exists; if yes, and at least one access is a |
| 261 | + write, then we encountered a racing access. |
| 262 | + |
| 263 | +2. Periodically, if no matching watchpoint exists, set up a watchpoint and |
| 264 | + stall for a small randomized delay. |
| 265 | + |
| 266 | +3. Also check the data value before the delay, and re-check the data value |
| 267 | + after delay; if the values mismatch, we infer a race of unknown origin. |
| 268 | + |
| 269 | +To detect data races between plain and marked accesses, KCSAN also annotates |
| 270 | +marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never |
| 271 | +sets up a watchpoint on marked accesses. By never setting up watchpoints for |
| 272 | +marked operations, if all accesses to a variable that is accessed concurrently |
| 273 | +are properly marked, KCSAN will never trigger a watchpoint and therefore never |
| 274 | +report the accesses. |
| 275 | + |
| 276 | +Key Properties |
| 277 | +~~~~~~~~~~~~~~ |
| 278 | + |
| 279 | +1. **Memory Overhead:** The overall memory overhead is only a few MiB |
| 280 | + depending on configuration. The current implementation uses a small array of |
| 281 | + longs to encode watchpoint information, which is negligible. |
| 282 | + |
| 283 | +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an |
| 284 | + efficient watchpoint encoding that does not require acquiring any shared |
| 285 | + locks in the fast-path. For kernel boot on a system with 8 CPUs: |
| 286 | + |
| 287 | + - 5.0x slow-down with the default KCSAN config; |
| 288 | + - 2.8x slow-down from runtime fast-path overhead only (set very large |
| 289 | + ``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``). |
| 290 | + |
| 291 | +3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN |
| 292 | + runtime. As a result, maintenance overheads are minimal as the kernel |
| 293 | + evolves. |
| 294 | + |
| 295 | +4. **Detects Racy Writes from Devices:** Due to checking data values upon |
| 296 | + setting up watchpoints, racy writes from devices can also be detected. |
| 297 | + |
| 298 | +5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering |
| 299 | + rules; this may result in missed data races (false negatives). |
| 300 | + |
| 301 | +6. **Analysis Accuracy:** For observed executions, due to using a sampling |
| 302 | + strategy, the analysis is *unsound* (false negatives possible), but aims to |
| 303 | + be complete (no false positives). |
| 304 | + |
| 305 | +Alternatives Considered |
| 306 | +----------------------- |
| 307 | + |
| 308 | +An alternative data race detection approach for the kernel can be found in the |
| 309 | +`Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_. |
| 310 | +KTSAN is a happens-before data race detector, which explicitly establishes the |
| 311 | +happens-before order between memory operations, which can then be used to |
| 312 | +determine data races as defined in `Data Races`_. |
| 313 | + |
| 314 | +To build a correct happens-before relation, KTSAN must be aware of all ordering |
| 315 | +rules of the LKMM and synchronization primitives. Unfortunately, any omission |
| 316 | +leads to large numbers of false positives, which is especially detrimental in |
| 317 | +the context of the kernel which includes numerous custom synchronization |
| 318 | +mechanisms. To track the happens-before relation, KTSAN's implementation |
| 319 | +requires metadata for each memory location (shadow memory), which for each page |
| 320 | +corresponds to 4 pages of shadow memory, and can translate into overhead of |
| 321 | +tens of GiB on a large system. |
0 commit comments