|
| 1 | +# Safety |
| 2 | + |
| 3 | +<!-- |
| 4 | +Part of the Carbon Language project, under the Apache License v2.0 with LLVM |
| 5 | +Exceptions. See /LICENSE for license information. |
| 6 | +SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| 7 | +--> |
| 8 | + |
| 9 | +<!-- toc --> |
| 10 | + |
| 11 | +## Table of contents |
| 12 | + |
| 13 | +- [Overview](#overview) |
| 14 | +- [Safe and unsafe code](#safe-and-unsafe-code) |
| 15 | +- [Safety modes](#safety-modes) |
| 16 | +- [Memory safety model](#memory-safety-model) |
| 17 | + - [Data races versus unsynchronized temporal safety](#data-races-versus-unsynchronized-temporal-safety) |
| 18 | +- [Safe library ecosystem](#safe-library-ecosystem) |
| 19 | +- [Build modes](#build-modes) |
| 20 | +- [Constraints](#constraints) |
| 21 | + - [Probabilistic techniques likely cannot stop attacks](#probabilistic-techniques-likely-cannot-stop-attacks) |
| 22 | +- [References](#references) |
| 23 | + |
| 24 | +<!-- tocstop --> |
| 25 | + |
| 26 | +## Overview |
| 27 | + |
| 28 | +One of Carbon's core goals is [practical safety]. This is referring to _[code |
| 29 | +safety]_ |
| 30 | +as opposed to the larger space of [systems safety]. The largest aspect of code safety |
| 31 | +at the language level is [memory safety], but this also applies to other aspects |
| 32 | +of code safety such as avoiding undefined behavior in other forms. |
| 33 | + |
| 34 | +[practical safety]: |
| 35 | + /docs/project/goals.md#practical-safety-and-testing-mechanisms |
| 36 | +[code safety]: |
| 37 | + /docs/design/safety/terminology.md#code-software-or-program-safety |
| 38 | +[systems safety]: /docs/design/safety/terminology.md#safety |
| 39 | +[memory safety]: /docs/design/safety/terminology.md#memory-safety |
| 40 | + |
| 41 | +However, Carbon also has a goal of fine grained, smooth interop-with and |
| 42 | +migration-from existing C++ code, and C++ is a fundamentally unsafe language. It |
| 43 | +has pervasive sources of undefined behavior and minimal memory safety |
| 44 | +guarantees. Our safety strategy has to address how C++ code fits into it, and |
| 45 | +provide an incremental path from where the code is at today towards increasing |
| 46 | +levels of safety. |
| 47 | + |
| 48 | +Ultimately, Carbon will both provide a [memory-safe language], _and_ provide a language |
| 49 | +that is a target for mechanical migration from C++ and optimizes even further for |
| 50 | +interop with unsafe C++ with minimal friction. |
| 51 | + |
| 52 | +[memory-safe language]: /docs/design/safety/terminology.md#memory-safe-language |
| 53 | + |
| 54 | +## Safe and unsafe code |
| 55 | + |
| 56 | +Carbon will have both _safe_ and _unsafe_ code. Safe code provides limits on the |
| 57 | +potential behavior of the program even in the face of bugs in order to prevent |
| 58 | +[safety bugs] from becoming [vulnerabilities]. Unsafe code is any code or operation |
| 59 | +which lacks limits or guarantees on behavior, and as a consequence may have undefined |
| 60 | +behavior or be a safety bug. |
| 61 | + |
| 62 | +[safety bugs]: /docs/design/safety/terminology.md#safety-bugs |
| 63 | +[vulnerabilities]: |
| 64 | + /docs/design/safety/terminology.md#vulnerability-or-security-vulnerability |
| 65 | + |
| 66 | +All things being equal, safe code constructs are preferable to unsafe |
| 67 | +constructs, but many unsafe constructs are necessary. Where unsafe constructs |
| 68 | +are needed, Carbon follows three principles to incorporate them into the |
| 69 | +language: |
| 70 | + |
| 71 | +- The unsafe capabilities provided should be semantically narrow: only the |
| 72 | + minimal unsafe operation to achieve the desired result. |
| 73 | +- The unsafe code should be syntactically narrow and separable from |
| 74 | + surrounding safe code. |
| 75 | +- There should be a reasonable way of including the keyword `unsafe` in |
| 76 | + whatever syntax is used so that this keyword can be used as a common |
| 77 | + annotation in audits and review. |
| 78 | + |
| 79 | +The result is that we don't model large regions or sections of Carbon code as |
| 80 | +"unsafe" or have a parallel "unsafe" variant language. We instead focus on the |
| 81 | +narrow and specific unsafe operations and constructs. |
| 82 | + |
| 83 | +Note that when we're talking about the narrow semantics of an unsafe capability, |
| 84 | +these are the semantics of the specific unsafe operation. For example, an unsafe |
| 85 | +type conversion shouldn't also allow unsafe out-of-lifetime access. This is |
| 86 | +separate from the _soundness_ implications of an unsafe operation, which may not |
| 87 | +be as easily narrowed. |
| 88 | + |
| 89 | +> **Future work**: More fully expand on the soundness principles and model for |
| 90 | +> safe Carbon code. This is an interesting and important area of the design that |
| 91 | +> isn't currently fleshed out in detail. |
| 92 | +
|
| 93 | +## Safety modes |
| 94 | + |
| 95 | +The _safety mode_ of Carbon governs the extent to which unsafe code must include |
| 96 | +the local `unsafe` keyword in its syntax to delineate it from safe code. |
| 97 | + |
| 98 | +_Strict Carbon_ is the mode in which all unsafe code is marked with the `unsafe` |
| 99 | +keyword. This mode is designed to satisfy the requirements of a [memory |
| 100 | +safe language]. |
| 101 | + |
| 102 | +_Permissive Carbon_ is a mode optimized for interop with C++ and automated |
| 103 | +migration from C++. In this mode, some unsafe code does not require an `unsafe` |
| 104 | +keyword: specific aspects of C++ interop or pervasive patterns that occur when |
| 105 | +migrating from C++. However, not _all_ unsafe code will omit the keyword, the |
| 106 | +permissive mode is designed to be minimal in the unsafe code allowed. |
| 107 | + |
| 108 | +Modes can be configured on an individual file as part of the package |
| 109 | +declaration, or an a function body as part of the function definition. More |
| 110 | +options such as regions of declarations or regions of statements can be explored |
| 111 | +in the future based on demand in practice when working with mixed-strictness |
| 112 | +code. More fine grained than statements is not expected given that the same core |
| 113 | +expressivity is available at that finer granularity through explicitly marking |
| 114 | +`unsafe` operations. |
| 115 | + |
| 116 | +> **Future work**: Define the exact syntax for package declaration and function |
| 117 | +> definition control of strictness. The syntax for enabling the permissive mode |
| 118 | +> must include the `unsafe` keyword as it is standing in place of more granular |
| 119 | +> use of the `unsafe` keywords and needs to still be discovered when auditing |
| 120 | +> for safety. |
| 121 | +
|
| 122 | +> **Future work**: Define how to mark `import`s of permissive libraries in |
| 123 | +> various contexts, balancing ergonomic burden against the potential for |
| 124 | +> surprising amounts af unsafety in dependencies. |
| 125 | +
|
| 126 | +## Memory safety model |
| 127 | + |
| 128 | +Carbon will use a hybrid of different techniques to achieve memory safety in its |
| 129 | +safe code, largely broken down by the categories of memory safety: |
| 130 | + |
| 131 | +- [Type safety]: compile-time enforcement, the same as other statically typed languages |
| 132 | + with generic type systems. |
| 133 | +- [Initialization safety]: hybrid of run-time and compile-time enforcement. |
| 134 | +- [Spatial safety]: run-time enforcement. |
| 135 | +- [Temporal safety]: compile-time enforcement through its type system. |
| 136 | +- [Data-race safety]: compile-time enforcement through its type system. |
| 137 | + |
| 138 | +[type safety]: /docs/design/safety/terminology.md#type-safety |
| 139 | +[initialization safety]: |
| 140 | + /docs/design/safety/terminology.md#initialization-safety |
| 141 | +[spatial safety]: /docs/design/safety/terminology.md#spatial-safety |
| 142 | +[temporal safety]: /docs/design/safety/terminology.md#temporal-safety |
| 143 | +[data-race safety]: /docs/design/safety/terminology.md#data-race-safety |
| 144 | + |
| 145 | +**At this high level, this means Carbon's memory safety model will largely match |
| 146 | +Rust's.** The only minor deviation at this level from Rust is the use of |
| 147 | +run-time enforcement for initialization, where Carbon will more heavily leverage |
| 148 | +run-time techniques such as automatic initialization and dynamic "optional" |
| 149 | +semantics to improve the ergonomics in Carbon. |
| 150 | + |
| 151 | +However, Carbon and Rust will have substantial differences in the _details_ of |
| 152 | +how they approach both temporal and data-race safety. |
| 153 | + |
| 154 | +### Data races versus unsynchronized temporal safety |
| 155 | + |
| 156 | +Carbon has the option of distinguishing between two similar but importantly |
| 157 | +different classes of bugs: data races and unsynchronized temporal safety |
| 158 | +violations. Specifically, there is no evidence from security teams that there is |
| 159 | +any significant volume of vulnerabilities that involve a data race bug but don't |
| 160 | +also involve a temporal memory safety violation. For example, despite both Go |
| 161 | +and non-strict-concurrency Swift only providing temporal safety, the rate of |
| 162 | +memory safety vulnerabilities in software written in both matches the expected |
| 163 | +low rate for memory-safe languages. As a consequence, Carbon has some |
| 164 | +flexibility while still being a [memory-safe language] according to our definition: |
| 165 | + |
| 166 | +- Carbon might choose to _not_ prevent data race bugs that are not |
| 167 | + _themselves_ also temporal safety bugs, even though the data race may lead |
| 168 | + to corruption and cause the program to later violate various other forms of |
| 169 | + memory safety. So far, evidence has not shown this to be as significant and |
| 170 | + prevalent source of _vulnerabilities_ as other forms of memory safety bugs. |
| 171 | +- However, Carbon _must_ detect and prevent cases where a lack of |
| 172 | + synchronization directly allows temporal safety bugs, such as use after |
| 173 | + free. |
| 174 | + |
| 175 | +Despite having this flexibility, preventing data race bugs remains _highly |
| 176 | +valuable_ for correctness, debugging, and achieving [fearless concurrency]. If Carbon |
| 177 | +can, it should work to prevent data races as well. |
| 178 | + |
| 179 | +[fearless concurrency]: https://doc.rust-lang.org/book/ch16-00-concurrency.html |
| 180 | + |
| 181 | +## Safe library ecosystem |
| 182 | + |
| 183 | +Carbon will need access to memory-safe library ecosystems, both for |
| 184 | +general-purpose, multi-platform functionality and for platform-specific |
| 185 | +functionality. Currently, the industry is currently investing a massive amount |
| 186 | +of effort to build out a sufficient ecosystem of general, multi-platform |
| 187 | +software using Rust, and it is critical that Carbon does not impede, slow down, |
| 188 | +or require duplicating that ecosystem. Similarly, if any other cross-platform |
| 189 | +library ecosystems emerge in any viable memory-safe languages given our |
| 190 | +performance constraints, we should work to reuse them and avoid duplication. |
| 191 | + |
| 192 | +**Carbon's strategy for safe and generally reusable cross-platform libraries is |
| 193 | +to leverage Rust libraries through interop.** This is a major motivating reason |
| 194 | +for seamless and safe Rust interop. The Carbon project will work to avoid |
| 195 | +creating duplication between the growing Rust library ecosystem and any future |
| 196 | +Carbon library ecosystem. Carbon's ecosystem will be focused instead on |
| 197 | +libraries and functionality that would either be missing or only available in |
| 198 | +C++. |
| 199 | + |
| 200 | +Platform-specific functionality is typically developed in that platform's native |
| 201 | +language, whether that is Swift for Apple platforms or Kotlin for Android. |
| 202 | +Again, the goal of Carbon should be to avoid duplicating functionality, and |
| 203 | +instead to prioritize high quality interop with the existing platform libraries |
| 204 | +in the relevant languages on those platforms. |
| 205 | + |
| 206 | +## Build modes |
| 207 | + |
| 208 | +Where Carbon's safety mode governs the language rules applied to unsafe code, |
| 209 | +Carbon's build modes will change the _behavior_ of unsafe code. |
| 210 | + |
| 211 | +There are two primary build modes: |
| 212 | + |
| 213 | +- **Release**: the primary build mode for programs in production, focuses on |
| 214 | + giving the best possible performance with a practical baseline of safety. |
| 215 | +- **Debug**: the primary build mode during development, focuses on |
| 216 | + high-quality developer experience. |
| 217 | + |
| 218 | +The release build will include a baseline of hardening necessary to uphold the |
| 219 | +run-time enforcement components of our |
| 220 | +[memory safety model](#memory-safety-model) above. This means, for example, that |
| 221 | +bounds checking is enabled in the release build. There is [evidence] that the |
| 222 | +cost of these hardening steps is low. Following the specific guidance of our top |
| 223 | +priority for [performance _control_], Carbon will provide ways to write unsafe code |
| 224 | +that disables the run-time enforcement, enabling the control of any overhead incurred. |
| 225 | + |
| 226 | +[evidence]: https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/ |
| 227 | +[performance control]: /docs/project/goals.md#performance-critical-software |
| 228 | + |
| 229 | +The debug build will change the actual behavior of both safe and unsafe code |
| 230 | +with detectable bugs or detectable undefined or erroneous behavior to have |
| 231 | +[fail-stop] behavior and provide detailed diagnostics to enable better |
| 232 | +debugging. This mode will at least provide similar bug [detection] capabilities |
| 233 | +to [AddressSanitizer] and [MemorySanitizer]. |
| 234 | + |
| 235 | +[fail-stop]: /docs/design/safety/terminology.md#fail-stop |
| 236 | +[detection]: /docs/design/safety/terminology.md#detecting |
| 237 | +[AddressSanitizer]: https://clang.llvm.org/docs/AddressSanitizer.html |
| 238 | +[MemorySanitizer]: https://clang.llvm.org/docs/MemorySanitizer.html |
| 239 | + |
| 240 | +Carbon will also have additional build modes to provide specific, narrow |
| 241 | +capabilities that cannot be reasonably combined into either of the above build |
| 242 | +modes. Each of these is expected to be an extension of either the debug or |
| 243 | +release build for that specific purpose. For example: |
| 244 | + |
| 245 | +- Debug + [ThreadSanitizer]: detection of [data-races][data-race safety]. |
| 246 | +- Release + specialized [hardening]: some users can afford significant |
| 247 | + run-time overhead in order to use additional hardening. Carbon will have |
| 248 | + several specialized build modes in this space. Hardening techniques in this |
| 249 | + space include [Control-Flow Integrity (CFI)][cfi] and hardware-accelerated |
| 250 | + address sanitizing ([HWASAN]). |
| 251 | + |
| 252 | +[ThreadSanitizer]: https://clang.llvm.org/docs/ThreadSanitizer.html |
| 253 | +[hardening]: /docs/design/safety/terminology.md#hardening |
| 254 | +[cfi]: https://clang.llvm.org/docs/ControlFlowIntegrity.html |
| 255 | +[hwasan]: |
| 256 | + https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html |
| 257 | + |
| 258 | +Carbon will provide a way to disable the default hardening in release builds, |
| 259 | +but not in a supported way. Its use is expected to be strictly for benchmarking |
| 260 | +purposes. |
| 261 | + |
| 262 | +## Constraints |
| 263 | + |
| 264 | +### Probabilistic techniques likely cannot stop attacks |
| 265 | + |
| 266 | +It's expected that non-cryptographic probabilistic techniques that can be |
| 267 | +applied at the language level are attackable through a variety of techniques: |
| 268 | + |
| 269 | +- The attacker might be able to attack repeatedly until it gets through. |
| 270 | +- The attacker may be able to determine when the attack would be detected and |
| 271 | + only run the attack when it would not be. |
| 272 | +- The attacker might be able control the test condition to make detection much |
| 273 | + less likely or avoid detection completely. For example, if detection is |
| 274 | + based on the last 4 bits of a memory address, an attacker may be able to |
| 275 | + generate memory allocations, viewing the address and only attacking when |
| 276 | + there's a collision. |
| 277 | + |
| 278 | +Hardware vulnerabilities may make these attacks easier than they might otherwise |
| 279 | +appear. Future hardware vulnerabilities are difficult to predict. In general, we |
| 280 | +do not expect non-cryptographic probabilistic techniques to be an effective |
| 281 | +approach to achieving safety. |
| 282 | + |
| 283 | +While _cryptographic_ probabilistic techniques can, and typically do, work |
| 284 | +carefully to not be subject to these weaknesses, they face a very different |
| 285 | +challenge. The overhead of a cryptographically secure hash is generally |
| 286 | +prohibitive for use in language level constructs. Further, some of the defenses |
| 287 | +against hardware vulnerabilities and improvements further exacerbate these |
| 288 | +overheads. However, when these can be applied usefully such as with [PKeys], |
| 289 | +they are robust. |
| 290 | + |
| 291 | +[PKeys]: https://docs.kernel.org/core-api/protection-keys.html |
| 292 | + |
| 293 | +## References |
| 294 | + |
| 295 | +- Proposal |
| 296 | + [#196: Language-level safety strategy](https://github.com/carbon-language/carbon-lang/pull/196). |
| 297 | +- Proposal |
| 298 | + [#5914: Updating Carbon's safety strategy](https://github.com/carbon-language/carbon-lang/pull/5914). |
0 commit comments