Skip to content

Commit 223d039

Browse files
chandlercdanakjjosh11bMForsterzygoloid
authored
Updating Carbon's safety strategy (#5914)
Carbon is accelerating and adjusting its safety strategy, specifically to flesh out its memory safety strategy and reflect simplifying developments in the safety space. This proposal replaces the previous directional safety strategy with a new concrete and updated framework for the safety design. It includes a specific framework for memory safety, simplified build modes, specific "safety modes", and terminology. This proposal also provides a _directional_ suggestion for temporal and data-race safety specifically. In addition to fully building out the above directional component, there are several other aspects of our safety design that will follow in subsequent proposals. The hope is to establish the initial framework here. --------- Co-authored-by: Dana Jansens <[email protected]> Co-authored-by: josh11b <[email protected]> Co-authored-by: Mike Forster <[email protected]> Co-authored-by: Richard Smith <[email protected]>
1 parent 30b8a93 commit 223d039

File tree

7 files changed

+810
-31
lines changed

7 files changed

+810
-31
lines changed

README.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -244,27 +244,28 @@ and with a smooth evolutionary path.
244244

245245
Safety, and especially
246246
[memory safety](https://en.wikipedia.org/wiki/Memory_safety), remains a key
247-
challenge for C++ and something a successor language needs to address. Our
248-
initial priority and focus is on immediately addressing important, low-hanging
249-
fruit in the safety space:
247+
challenge for C++ and something a successor language needs to address.
248+
249+
We plan to support a two step migration process:
250+
251+
1. Highly automated, minimal supervision migration from C++ to a dialect of
252+
Carbon designed for C++ interop and migration.
253+
2. Incremental refactoring of the Carbon code to adopt memory-safe designs,
254+
patterns, and APIs.
255+
256+
We also want to address important, low-hanging fruit in the safety space
257+
immediately when migrating into Carbon:
250258

251259
- Tracking uninitialized states better, increased enforcement of
252-
initialization, and systematically providing hardening against
253-
initialization bugs when desired.
254-
- Designing fundamental APIs and idioms to support dynamic bounds checks in
255-
debug and hardened builds.
256-
- Having a default debug build mode that is both cheaper and more
257-
comprehensive than existing C++ build modes even when combined with
260+
initialization, and hardening against initialization bugs when needed.
261+
- Designing fundamental APIs and idioms to support dynamic bounds checking.
262+
- Switching from undefined behavior to erroneous behavior wherever possible,
263+
and marking the remaining undefined behavior with visible `unsafe` syntax.
264+
- Having a default debug build mode that has less runtime overhead while being
265+
more comprehensive than existing C++ debug build modes combined with
258266
[Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer).
259267

260-
Once we can migrate code into Carbon, we will have a simplified language with
261-
room in the design space to add any necessary annotations or features, and
262-
infrastructure like [generics](#generics) to support safer design patterns.
263-
Longer term, we will build on this to introduce **a safe Carbon subset**. This
264-
will be a large and complex undertaking, and won't be in the 0.1 design.
265-
Meanwhile, we are closely watching and learning from efforts to add memory safe
266-
semantics onto C++ such as Rust-inspired
267-
[lifetime annotations](https://discourse.llvm.org/t/rfc-lifetime-annotations-for-c/61377).
268+
For more details, see our [safety design](/docs/design/safety).
268269

269270
## Getting started
270271

docs/design/README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -372,14 +372,11 @@ required to be the only non-whitespace on the line.
372372

373373
The behavior of the Carbon compiler depends on the _build mode_:
374374

375-
- In a _development build_, the priority is diagnosing problems and fast build
376-
time.
377-
- In a _performance build_, the priority is fastest execution time and lowest
375+
- In a _debug build_, the priority is diagnosing problems and fast build time.
376+
- In a _release build_, the priority is fastest execution time and lowest
378377
memory usage.
379-
- In a _hardened build_, the first priority is safety and second is
380-
performance.
381378

382-
> References: [Safety strategy](/docs/project/principles/safety_strategy.md)
379+
> References: [Safety design](/docs/design/safety#build-modes)
383380
384381
## Types are values
385382

@@ -3746,7 +3743,7 @@ This leads to Carbon's incremental path to safety:
37463743
- Shift the Carbon code onto an incremental path towards memory safety over
37473744
the next decade.
37483745

3749-
> References: [Safety strategy](/docs/project/principles/safety_strategy.md)
3746+
> References: [Safety design](/docs/design/safety)
37503747
37513748
### Lifetime and move semantics
37523749

docs/design/safety/README.md

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
# Safety
2+
3+
<!--
4+
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
5+
Exceptions. See /LICENSE for license information.
6+
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
7+
-->
8+
9+
<!-- toc -->
10+
11+
## Table of contents
12+
13+
- [Overview](#overview)
14+
- [Safe and unsafe code](#safe-and-unsafe-code)
15+
- [Safety modes](#safety-modes)
16+
- [Memory safety model](#memory-safety-model)
17+
- [Data races versus unsynchronized temporal safety](#data-races-versus-unsynchronized-temporal-safety)
18+
- [Safe library ecosystem](#safe-library-ecosystem)
19+
- [Build modes](#build-modes)
20+
- [Constraints](#constraints)
21+
- [Probabilistic techniques likely cannot stop attacks](#probabilistic-techniques-likely-cannot-stop-attacks)
22+
- [References](#references)
23+
24+
<!-- tocstop -->
25+
26+
## Overview
27+
28+
One of Carbon's core goals is [practical safety]. This is referring to _[code
29+
safety]_
30+
as opposed to the larger space of [systems safety]. The largest aspect of code safety
31+
at the language level is [memory safety], but this also applies to other aspects
32+
of code safety such as avoiding undefined behavior in other forms.
33+
34+
[practical safety]:
35+
/docs/project/goals.md#practical-safety-and-testing-mechanisms
36+
[code safety]:
37+
/docs/design/safety/terminology.md#code-software-or-program-safety
38+
[systems safety]: /docs/design/safety/terminology.md#safety
39+
[memory safety]: /docs/design/safety/terminology.md#memory-safety
40+
41+
However, Carbon also has a goal of fine grained, smooth interop-with and
42+
migration-from existing C++ code, and C++ is a fundamentally unsafe language. It
43+
has pervasive sources of undefined behavior and minimal memory safety
44+
guarantees. Our safety strategy has to address how C++ code fits into it, and
45+
provide an incremental path from where the code is at today towards increasing
46+
levels of safety.
47+
48+
Ultimately, Carbon will both provide a [memory-safe language], _and_ provide a language
49+
that is a target for mechanical migration from C++ and optimizes even further for
50+
interop with unsafe C++ with minimal friction.
51+
52+
[memory-safe language]: /docs/design/safety/terminology.md#memory-safe-language
53+
54+
## Safe and unsafe code
55+
56+
Carbon will have both _safe_ and _unsafe_ code. Safe code provides limits on the
57+
potential behavior of the program even in the face of bugs in order to prevent
58+
[safety bugs] from becoming [vulnerabilities]. Unsafe code is any code or operation
59+
which lacks limits or guarantees on behavior, and as a consequence may have undefined
60+
behavior or be a safety bug.
61+
62+
[safety bugs]: /docs/design/safety/terminology.md#safety-bugs
63+
[vulnerabilities]:
64+
/docs/design/safety/terminology.md#vulnerability-or-security-vulnerability
65+
66+
All things being equal, safe code constructs are preferable to unsafe
67+
constructs, but many unsafe constructs are necessary. Where unsafe constructs
68+
are needed, Carbon follows three principles to incorporate them into the
69+
language:
70+
71+
- The unsafe capabilities provided should be semantically narrow: only the
72+
minimal unsafe operation to achieve the desired result.
73+
- The unsafe code should be syntactically narrow and separable from
74+
surrounding safe code.
75+
- There should be a reasonable way of including the keyword `unsafe` in
76+
whatever syntax is used so that this keyword can be used as a common
77+
annotation in audits and review.
78+
79+
The result is that we don't model large regions or sections of Carbon code as
80+
"unsafe" or have a parallel "unsafe" variant language. We instead focus on the
81+
narrow and specific unsafe operations and constructs.
82+
83+
Note that when we're talking about the narrow semantics of an unsafe capability,
84+
these are the semantics of the specific unsafe operation. For example, an unsafe
85+
type conversion shouldn't also allow unsafe out-of-lifetime access. This is
86+
separate from the _soundness_ implications of an unsafe operation, which may not
87+
be as easily narrowed.
88+
89+
> **Future work**: More fully expand on the soundness principles and model for
90+
> safe Carbon code. This is an interesting and important area of the design that
91+
> isn't currently fleshed out in detail.
92+
93+
## Safety modes
94+
95+
The _safety mode_ of Carbon governs the extent to which unsafe code must include
96+
the local `unsafe` keyword in its syntax to delineate it from safe code.
97+
98+
_Strict Carbon_ is the mode in which all unsafe code is marked with the `unsafe`
99+
keyword. This mode is designed to satisfy the requirements of a [memory
100+
safe language].
101+
102+
_Permissive Carbon_ is a mode optimized for interop with C++ and automated
103+
migration from C++. In this mode, some unsafe code does not require an `unsafe`
104+
keyword: specific aspects of C++ interop or pervasive patterns that occur when
105+
migrating from C++. However, not _all_ unsafe code will omit the keyword, the
106+
permissive mode is designed to be minimal in the unsafe code allowed.
107+
108+
Modes can be configured on an individual file as part of the package
109+
declaration, or an a function body as part of the function definition. More
110+
options such as regions of declarations or regions of statements can be explored
111+
in the future based on demand in practice when working with mixed-strictness
112+
code. More fine grained than statements is not expected given that the same core
113+
expressivity is available at that finer granularity through explicitly marking
114+
`unsafe` operations.
115+
116+
> **Future work**: Define the exact syntax for package declaration and function
117+
> definition control of strictness. The syntax for enabling the permissive mode
118+
> must include the `unsafe` keyword as it is standing in place of more granular
119+
> use of the `unsafe` keywords and needs to still be discovered when auditing
120+
> for safety.
121+
122+
> **Future work**: Define how to mark `import`s of permissive libraries in
123+
> various contexts, balancing ergonomic burden against the potential for
124+
> surprising amounts af unsafety in dependencies.
125+
126+
## Memory safety model
127+
128+
Carbon will use a hybrid of different techniques to achieve memory safety in its
129+
safe code, largely broken down by the categories of memory safety:
130+
131+
- [Type safety]: compile-time enforcement, the same as other statically typed languages
132+
with generic type systems.
133+
- [Initialization safety]: hybrid of run-time and compile-time enforcement.
134+
- [Spatial safety]: run-time enforcement.
135+
- [Temporal safety]: compile-time enforcement through its type system.
136+
- [Data-race safety]: compile-time enforcement through its type system.
137+
138+
[type safety]: /docs/design/safety/terminology.md#type-safety
139+
[initialization safety]:
140+
/docs/design/safety/terminology.md#initialization-safety
141+
[spatial safety]: /docs/design/safety/terminology.md#spatial-safety
142+
[temporal safety]: /docs/design/safety/terminology.md#temporal-safety
143+
[data-race safety]: /docs/design/safety/terminology.md#data-race-safety
144+
145+
**At this high level, this means Carbon's memory safety model will largely match
146+
Rust's.** The only minor deviation at this level from Rust is the use of
147+
run-time enforcement for initialization, where Carbon will more heavily leverage
148+
run-time techniques such as automatic initialization and dynamic "optional"
149+
semantics to improve the ergonomics in Carbon.
150+
151+
However, Carbon and Rust will have substantial differences in the _details_ of
152+
how they approach both temporal and data-race safety.
153+
154+
### Data races versus unsynchronized temporal safety
155+
156+
Carbon has the option of distinguishing between two similar but importantly
157+
different classes of bugs: data races and unsynchronized temporal safety
158+
violations. Specifically, there is no evidence from security teams that there is
159+
any significant volume of vulnerabilities that involve a data race bug but don't
160+
also involve a temporal memory safety violation. For example, despite both Go
161+
and non-strict-concurrency Swift only providing temporal safety, the rate of
162+
memory safety vulnerabilities in software written in both matches the expected
163+
low rate for memory-safe languages. As a consequence, Carbon has some
164+
flexibility while still being a [memory-safe language] according to our definition:
165+
166+
- Carbon might choose to _not_ prevent data race bugs that are not
167+
_themselves_ also temporal safety bugs, even though the data race may lead
168+
to corruption and cause the program to later violate various other forms of
169+
memory safety. So far, evidence has not shown this to be as significant and
170+
prevalent source of _vulnerabilities_ as other forms of memory safety bugs.
171+
- However, Carbon _must_ detect and prevent cases where a lack of
172+
synchronization directly allows temporal safety bugs, such as use after
173+
free.
174+
175+
Despite having this flexibility, preventing data race bugs remains _highly
176+
valuable_ for correctness, debugging, and achieving [fearless concurrency]. If Carbon
177+
can, it should work to prevent data races as well.
178+
179+
[fearless concurrency]: https://doc.rust-lang.org/book/ch16-00-concurrency.html
180+
181+
## Safe library ecosystem
182+
183+
Carbon will need access to memory-safe library ecosystems, both for
184+
general-purpose, multi-platform functionality and for platform-specific
185+
functionality. Currently, the industry is currently investing a massive amount
186+
of effort to build out a sufficient ecosystem of general, multi-platform
187+
software using Rust, and it is critical that Carbon does not impede, slow down,
188+
or require duplicating that ecosystem. Similarly, if any other cross-platform
189+
library ecosystems emerge in any viable memory-safe languages given our
190+
performance constraints, we should work to reuse them and avoid duplication.
191+
192+
**Carbon's strategy for safe and generally reusable cross-platform libraries is
193+
to leverage Rust libraries through interop.** This is a major motivating reason
194+
for seamless and safe Rust interop. The Carbon project will work to avoid
195+
creating duplication between the growing Rust library ecosystem and any future
196+
Carbon library ecosystem. Carbon's ecosystem will be focused instead on
197+
libraries and functionality that would either be missing or only available in
198+
C++.
199+
200+
Platform-specific functionality is typically developed in that platform's native
201+
language, whether that is Swift for Apple platforms or Kotlin for Android.
202+
Again, the goal of Carbon should be to avoid duplicating functionality, and
203+
instead to prioritize high quality interop with the existing platform libraries
204+
in the relevant languages on those platforms.
205+
206+
## Build modes
207+
208+
Where Carbon's safety mode governs the language rules applied to unsafe code,
209+
Carbon's build modes will change the _behavior_ of unsafe code.
210+
211+
There are two primary build modes:
212+
213+
- **Release**: the primary build mode for programs in production, focuses on
214+
giving the best possible performance with a practical baseline of safety.
215+
- **Debug**: the primary build mode during development, focuses on
216+
high-quality developer experience.
217+
218+
The release build will include a baseline of hardening necessary to uphold the
219+
run-time enforcement components of our
220+
[memory safety model](#memory-safety-model) above. This means, for example, that
221+
bounds checking is enabled in the release build. There is [evidence] that the
222+
cost of these hardening steps is low. Following the specific guidance of our top
223+
priority for [performance _control_], Carbon will provide ways to write unsafe code
224+
that disables the run-time enforcement, enabling the control of any overhead incurred.
225+
226+
[evidence]: https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/
227+
[performance control]: /docs/project/goals.md#performance-critical-software
228+
229+
The debug build will change the actual behavior of both safe and unsafe code
230+
with detectable bugs or detectable undefined or erroneous behavior to have
231+
[fail-stop] behavior and provide detailed diagnostics to enable better
232+
debugging. This mode will at least provide similar bug [detection] capabilities
233+
to [AddressSanitizer] and [MemorySanitizer].
234+
235+
[fail-stop]: /docs/design/safety/terminology.md#fail-stop
236+
[detection]: /docs/design/safety/terminology.md#detecting
237+
[AddressSanitizer]: https://clang.llvm.org/docs/AddressSanitizer.html
238+
[MemorySanitizer]: https://clang.llvm.org/docs/MemorySanitizer.html
239+
240+
Carbon will also have additional build modes to provide specific, narrow
241+
capabilities that cannot be reasonably combined into either of the above build
242+
modes. Each of these is expected to be an extension of either the debug or
243+
release build for that specific purpose. For example:
244+
245+
- Debug + [ThreadSanitizer]: detection of [data-races][data-race safety].
246+
- Release + specialized [hardening]: some users can afford significant
247+
run-time overhead in order to use additional hardening. Carbon will have
248+
several specialized build modes in this space. Hardening techniques in this
249+
space include [Control-Flow Integrity (CFI)][cfi] and hardware-accelerated
250+
address sanitizing ([HWASAN]).
251+
252+
[ThreadSanitizer]: https://clang.llvm.org/docs/ThreadSanitizer.html
253+
[hardening]: /docs/design/safety/terminology.md#hardening
254+
[cfi]: https://clang.llvm.org/docs/ControlFlowIntegrity.html
255+
[hwasan]:
256+
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
257+
258+
Carbon will provide a way to disable the default hardening in release builds,
259+
but not in a supported way. Its use is expected to be strictly for benchmarking
260+
purposes.
261+
262+
## Constraints
263+
264+
### Probabilistic techniques likely cannot stop attacks
265+
266+
It's expected that non-cryptographic probabilistic techniques that can be
267+
applied at the language level are attackable through a variety of techniques:
268+
269+
- The attacker might be able to attack repeatedly until it gets through.
270+
- The attacker may be able to determine when the attack would be detected and
271+
only run the attack when it would not be.
272+
- The attacker might be able control the test condition to make detection much
273+
less likely or avoid detection completely. For example, if detection is
274+
based on the last 4 bits of a memory address, an attacker may be able to
275+
generate memory allocations, viewing the address and only attacking when
276+
there's a collision.
277+
278+
Hardware vulnerabilities may make these attacks easier than they might otherwise
279+
appear. Future hardware vulnerabilities are difficult to predict. In general, we
280+
do not expect non-cryptographic probabilistic techniques to be an effective
281+
approach to achieving safety.
282+
283+
While _cryptographic_ probabilistic techniques can, and typically do, work
284+
carefully to not be subject to these weaknesses, they face a very different
285+
challenge. The overhead of a cryptographically secure hash is generally
286+
prohibitive for use in language level constructs. Further, some of the defenses
287+
against hardware vulnerabilities and improvements further exacerbate these
288+
overheads. However, when these can be applied usefully such as with [PKeys],
289+
they are robust.
290+
291+
[PKeys]: https://docs.kernel.org/core-api/protection-keys.html
292+
293+
## References
294+
295+
- Proposal
296+
[#196: Language-level safety strategy](https://github.com/carbon-language/carbon-lang/pull/196).
297+
- Proposal
298+
[#5914: Updating Carbon's safety strategy](https://github.com/carbon-language/carbon-lang/pull/5914).

0 commit comments

Comments
 (0)