diff --git a/README.md b/README.md index 3693e77f87854..8c74e34099c88 100644 --- a/README.md +++ b/README.md @@ -244,27 +244,28 @@ and with a smooth evolutionary path. Safety, and especially [memory safety](https://en.wikipedia.org/wiki/Memory_safety), remains a key -challenge for C++ and something a successor language needs to address. Our -initial priority and focus is on immediately addressing important, low-hanging -fruit in the safety space: +challenge for C++ and something a successor language needs to address. + +We plan to support a two step migration process: + +1. Highly automated, minimal supervision migration from C++ to a dialect of + Carbon designed for C++ interop and migration. +2. Incremental refactoring of the Carbon code to adopt memory-safe designs, + patterns, and APIs. + +We also want to address important, low-hanging fruit in the safety space +immediately when migrating into Carbon: - Tracking uninitialized states better, increased enforcement of - initialization, and systematically providing hardening against - initialization bugs when desired. -- Designing fundamental APIs and idioms to support dynamic bounds checks in - debug and hardened builds. -- Having a default debug build mode that is both cheaper and more - comprehensive than existing C++ build modes even when combined with + initialization, and hardening against initialization bugs when needed. +- Designing fundamental APIs and idioms to support dynamic bounds checking. +- Switching from undefined behavior to erroneous behavior wherever possible, + and marking the remaining undefined behavior with visible `unsafe` syntax. +- Having a default debug build mode that has less runtime overhead while being + more comprehensive than existing C++ debug build modes combined with [Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer). -Once we can migrate code into Carbon, we will have a simplified language with -room in the design space to add any necessary annotations or features, and -infrastructure like [generics](#generics) to support safer design patterns. -Longer term, we will build on this to introduce **a safe Carbon subset**. This -will be a large and complex undertaking, and won't be in the 0.1 design. -Meanwhile, we are closely watching and learning from efforts to add memory safe -semantics onto C++ such as Rust-inspired -[lifetime annotations](https://discourse.llvm.org/t/rfc-lifetime-annotations-for-c/61377). +For more details, see our [safety design](/docs/design/safety). ## Getting started diff --git a/docs/design/README.md b/docs/design/README.md index 670a96924888b..8a101f8a813b6 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -372,14 +372,11 @@ required to be the only non-whitespace on the line. The behavior of the Carbon compiler depends on the _build mode_: -- In a _development build_, the priority is diagnosing problems and fast build - time. -- In a _performance build_, the priority is fastest execution time and lowest +- In a _debug build_, the priority is diagnosing problems and fast build time. +- In a _release build_, the priority is fastest execution time and lowest memory usage. -- In a _hardened build_, the first priority is safety and second is - performance. -> References: [Safety strategy](/docs/project/principles/safety_strategy.md) +> References: [Safety design](/docs/design/safety#build-modes) ## Types are values @@ -3746,7 +3743,7 @@ This leads to Carbon's incremental path to safety: - Shift the Carbon code onto an incremental path towards memory safety over the next decade. -> References: [Safety strategy](/docs/project/principles/safety_strategy.md) +> References: [Safety design](/docs/design/safety) ### Lifetime and move semantics diff --git a/docs/design/safety/README.md b/docs/design/safety/README.md new file mode 100644 index 0000000000000..498fa7de7eef3 --- /dev/null +++ b/docs/design/safety/README.md @@ -0,0 +1,298 @@ +# Safety + + + + + +## Table of contents + +- [Overview](#overview) +- [Safe and unsafe code](#safe-and-unsafe-code) +- [Safety modes](#safety-modes) +- [Memory safety model](#memory-safety-model) + - [Data races versus unsynchronized temporal safety](#data-races-versus-unsynchronized-temporal-safety) +- [Safe library ecosystem](#safe-library-ecosystem) +- [Build modes](#build-modes) +- [Constraints](#constraints) + - [Probabilistic techniques likely cannot stop attacks](#probabilistic-techniques-likely-cannot-stop-attacks) +- [References](#references) + + + +## Overview + +One of Carbon's core goals is [practical safety]. This is referring to _[code +safety]_ +as opposed to the larger space of [systems safety]. The largest aspect of code safety +at the language level is [memory safety], but this also applies to other aspects +of code safety such as avoiding undefined behavior in other forms. + +[practical safety]: + /docs/project/goals.md#practical-safety-and-testing-mechanisms +[code safety]: + /docs/design/safety/terminology.md#code-software-or-program-safety +[systems safety]: /docs/design/safety/terminology.md#safety +[memory safety]: /docs/design/safety/terminology.md#memory-safety + +However, Carbon also has a goal of fine grained, smooth interop-with and +migration-from existing C++ code, and C++ is a fundamentally unsafe language. It +has pervasive sources of undefined behavior and minimal memory safety +guarantees. Our safety strategy has to address how C++ code fits into it, and +provide an incremental path from where the code is at today towards increasing +levels of safety. + +Ultimately, Carbon will both provide a [memory-safe language], _and_ provide a language +that is a target for mechanical migration from C++ and optimizes even further for +interop with unsafe C++ with minimal friction. + +[memory-safe language]: /docs/design/safety/terminology.md#memory-safe-language + +## Safe and unsafe code + +Carbon will have both _safe_ and _unsafe_ code. Safe code provides limits on the +potential behavior of the program even in the face of bugs in order to prevent +[safety bugs] from becoming [vulnerabilities]. Unsafe code is any code or operation +which lacks limits or guarantees on behavior, and as a consequence may have undefined +behavior or be a safety bug. + +[safety bugs]: /docs/design/safety/terminology.md#safety-bugs +[vulnerabilities]: + /docs/design/safety/terminology.md#vulnerability-or-security-vulnerability + +All things being equal, safe code constructs are preferable to unsafe +constructs, but many unsafe constructs are necessary. Where unsafe constructs +are needed, Carbon follows three principles to incorporate them into the +language: + +- The unsafe capabilities provided should be semantically narrow: only the + minimal unsafe operation to achieve the desired result. +- The unsafe code should be syntactically narrow and separable from + surrounding safe code. +- There should be a reasonable way of including the keyword `unsafe` in + whatever syntax is used so that this keyword can be used as a common + annotation in audits and review. + +The result is that we don't model large regions or sections of Carbon code as +"unsafe" or have a parallel "unsafe" variant language. We instead focus on the +narrow and specific unsafe operations and constructs. + +Note that when we're talking about the narrow semantics of an unsafe capability, +these are the semantics of the specific unsafe operation. For example, an unsafe +type conversion shouldn't also allow unsafe out-of-lifetime access. This is +separate from the _soundness_ implications of an unsafe operation, which may not +be as easily narrowed. + +> **Future work**: More fully expand on the soundness principles and model for +> safe Carbon code. This is an interesting and important area of the design that +> isn't currently fleshed out in detail. + +## Safety modes + +The _safety mode_ of Carbon governs the extent to which unsafe code must include +the local `unsafe` keyword in its syntax to delineate it from safe code. + +_Strict Carbon_ is the mode in which all unsafe code is marked with the `unsafe` +keyword. This mode is designed to satisfy the requirements of a [memory +safe language]. + +_Permissive Carbon_ is a mode optimized for interop with C++ and automated +migration from C++. In this mode, some unsafe code does not require an `unsafe` +keyword: specific aspects of C++ interop or pervasive patterns that occur when +migrating from C++. However, not _all_ unsafe code will omit the keyword, the +permissive mode is designed to be minimal in the unsafe code allowed. + +Modes can be configured on an individual file as part of the package +declaration, or an a function body as part of the function definition. More +options such as regions of declarations or regions of statements can be explored +in the future based on demand in practice when working with mixed-strictness +code. More fine grained than statements is not expected given that the same core +expressivity is available at that finer granularity through explicitly marking +`unsafe` operations. + +> **Future work**: Define the exact syntax for package declaration and function +> definition control of strictness. The syntax for enabling the permissive mode +> must include the `unsafe` keyword as it is standing in place of more granular +> use of the `unsafe` keywords and needs to still be discovered when auditing +> for safety. + +> **Future work**: Define how to mark `import`s of permissive libraries in +> various contexts, balancing ergonomic burden against the potential for +> surprising amounts af unsafety in dependencies. + +## Memory safety model + +Carbon will use a hybrid of different techniques to achieve memory safety in its +safe code, largely broken down by the categories of memory safety: + +- [Type safety]: compile-time enforcement, the same as other statically typed languages + with generic type systems. +- [Initialization safety]: hybrid of run-time and compile-time enforcement. +- [Spatial safety]: run-time enforcement. +- [Temporal safety]: compile-time enforcement through its type system. +- [Data-race safety]: compile-time enforcement through its type system. + +[type safety]: /docs/design/safety/terminology.md#type-safety +[initialization safety]: + /docs/design/safety/terminology.md#initialization-safety +[spatial safety]: /docs/design/safety/terminology.md#spatial-safety +[temporal safety]: /docs/design/safety/terminology.md#temporal-safety +[data-race safety]: /docs/design/safety/terminology.md#data-race-safety + +**At this high level, this means Carbon's memory safety model will largely match +Rust's.** The only minor deviation at this level from Rust is the use of +run-time enforcement for initialization, where Carbon will more heavily leverage +run-time techniques such as automatic initialization and dynamic "optional" +semantics to improve the ergonomics in Carbon. + +However, Carbon and Rust will have substantial differences in the _details_ of +how they approach both temporal and data-race safety. + +### Data races versus unsynchronized temporal safety + +Carbon has the option of distinguishing between two similar but importantly +different classes of bugs: data races and unsynchronized temporal safety +violations. Specifically, there is no evidence from security teams that there is +any significant volume of vulnerabilities that involve a data race bug but don't +also involve a temporal memory safety violation. For example, despite both Go +and non-strict-concurrency Swift only providing temporal safety, the rate of +memory safety vulnerabilities in software written in both matches the expected +low rate for memory-safe languages. As a consequence, Carbon has some +flexibility while still being a [memory-safe language] according to our definition: + +- Carbon might choose to _not_ prevent data race bugs that are not + _themselves_ also temporal safety bugs, even though the data race may lead + to corruption and cause the program to later violate various other forms of + memory safety. So far, evidence has not shown this to be as significant and + prevalent source of _vulnerabilities_ as other forms of memory safety bugs. +- However, Carbon _must_ detect and prevent cases where a lack of + synchronization directly allows temporal safety bugs, such as use after + free. + +Despite having this flexibility, preventing data race bugs remains _highly +valuable_ for correctness, debugging, and achieving [fearless concurrency]. If Carbon +can, it should work to prevent data races as well. + +[fearless concurrency]: https://doc.rust-lang.org/book/ch16-00-concurrency.html + +## Safe library ecosystem + +Carbon will need access to memory-safe library ecosystems, both for +general-purpose, multi-platform functionality and for platform-specific +functionality. Currently, the industry is currently investing a massive amount +of effort to build out a sufficient ecosystem of general, multi-platform +software using Rust, and it is critical that Carbon does not impede, slow down, +or require duplicating that ecosystem. Similarly, if any other cross-platform +library ecosystems emerge in any viable memory-safe languages given our +performance constraints, we should work to reuse them and avoid duplication. + +**Carbon's strategy for safe and generally reusable cross-platform libraries is +to leverage Rust libraries through interop.** This is a major motivating reason +for seamless and safe Rust interop. The Carbon project will work to avoid +creating duplication between the growing Rust library ecosystem and any future +Carbon library ecosystem. Carbon's ecosystem will be focused instead on +libraries and functionality that would either be missing or only available in +C++. + +Platform-specific functionality is typically developed in that platform's native +language, whether that is Swift for Apple platforms or Kotlin for Android. +Again, the goal of Carbon should be to avoid duplicating functionality, and +instead to prioritize high quality interop with the existing platform libraries +in the relevant languages on those platforms. + +## Build modes + +Where Carbon's safety mode governs the language rules applied to unsafe code, +Carbon's build modes will change the _behavior_ of unsafe code. + +There are two primary build modes: + +- **Release**: the primary build mode for programs in production, focuses on + giving the best possible performance with a practical baseline of safety. +- **Debug**: the primary build mode during development, focuses on + high-quality developer experience. + +The release build will include a baseline of hardening necessary to uphold the +run-time enforcement components of our +[memory safety model](#memory-safety-model) above. This means, for example, that +bounds checking is enabled in the release build. There is [evidence] that the +cost of these hardening steps is low. Following the specific guidance of our top +priority for [performance _control_], Carbon will provide ways to write unsafe code +that disables the run-time enforcement, enabling the control of any overhead incurred. + +[evidence]: https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/ +[performance control]: /docs/project/goals.md#performance-critical-software + +The debug build will change the actual behavior of both safe and unsafe code +with detectable bugs or detectable undefined or erroneous behavior to have +[fail-stop] behavior and provide detailed diagnostics to enable better +debugging. This mode will at least provide similar bug [detection] capabilities +to [AddressSanitizer] and [MemorySanitizer]. + +[fail-stop]: /docs/design/safety/terminology.md#fail-stop +[detection]: /docs/design/safety/terminology.md#detecting +[AddressSanitizer]: https://clang.llvm.org/docs/AddressSanitizer.html +[MemorySanitizer]: https://clang.llvm.org/docs/MemorySanitizer.html + +Carbon will also have additional build modes to provide specific, narrow +capabilities that cannot be reasonably combined into either of the above build +modes. Each of these is expected to be an extension of either the debug or +release build for that specific purpose. For example: + +- Debug + [ThreadSanitizer]: detection of [data-races][data-race safety]. +- Release + specialized [hardening]: some users can afford significant + run-time overhead in order to use additional hardening. Carbon will have + several specialized build modes in this space. Hardening techniques in this + space include [Control-Flow Integrity (CFI)][cfi] and hardware-accelerated + address sanitizing ([HWASAN]). + +[ThreadSanitizer]: https://clang.llvm.org/docs/ThreadSanitizer.html +[hardening]: /docs/design/safety/terminology.md#hardening +[cfi]: https://clang.llvm.org/docs/ControlFlowIntegrity.html +[hwasan]: + https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html + +Carbon will provide a way to disable the default hardening in release builds, +but not in a supported way. Its use is expected to be strictly for benchmarking +purposes. + +## Constraints + +### Probabilistic techniques likely cannot stop attacks + +It's expected that non-cryptographic probabilistic techniques that can be +applied at the language level are attackable through a variety of techniques: + +- The attacker might be able to attack repeatedly until it gets through. +- The attacker may be able to determine when the attack would be detected and + only run the attack when it would not be. +- The attacker might be able control the test condition to make detection much + less likely or avoid detection completely. For example, if detection is + based on the last 4 bits of a memory address, an attacker may be able to + generate memory allocations, viewing the address and only attacking when + there's a collision. + +Hardware vulnerabilities may make these attacks easier than they might otherwise +appear. Future hardware vulnerabilities are difficult to predict. In general, we +do not expect non-cryptographic probabilistic techniques to be an effective +approach to achieving safety. + +While _cryptographic_ probabilistic techniques can, and typically do, work +carefully to not be subject to these weaknesses, they face a very different +challenge. The overhead of a cryptographically secure hash is generally +prohibitive for use in language level constructs. Further, some of the defenses +against hardware vulnerabilities and improvements further exacerbate these +overheads. However, when these can be applied usefully such as with [PKeys], +they are robust. + +[PKeys]: https://docs.kernel.org/core-api/protection-keys.html + +## References + +- Proposal + [#196: Language-level safety strategy](https://github.com/carbon-language/carbon-lang/pull/196). +- Proposal + [#5914: Updating Carbon's safety strategy](https://github.com/carbon-language/carbon-lang/pull/5914). diff --git a/docs/design/safety/terminology.md b/docs/design/safety/terminology.md new file mode 100644 index 0000000000000..beffb9a19df3d --- /dev/null +++ b/docs/design/safety/terminology.md @@ -0,0 +1,209 @@ +# Safety: Terminology + + + + + +## Table of contents + +- [Core Terminology](#core-terminology) + - [_Hazard_](#hazard) + - [_Bug_ or _defect_](#bug-or-defect) + - [_Active bug_](#active-bug) + - [_Latent bug_](#latent-bug) + - [_Safety_](#safety) + - [_Code_, _software_, or _program safety_](#code-software-or-program-safety) + - [_Safety bugs_](#safety-bugs) + - [_Initial bug_](#initial-bug) + - [_Fail-stop_](#fail-stop) +- [Vulnerability Terminology](#vulnerability-terminology) + - [_Vulnerability_ or _security vulnerability_](#vulnerability-or-security-vulnerability) + - [_Vulnerability defense_](#vulnerability-defense) + - [_Detecting_](#detecting) + - [_Mitigating_](#mitigating) + - [_Preventing_ vulnerabilities](#preventing-vulnerabilities) + - [_Ensuring_ correctness](#ensuring-correctness) + - [_Hardening_](#hardening) +- [Memory Safety Specifics](#memory-safety-specifics) + - [_Memory safety_](#memory-safety) + - [_Temporal safety_](#temporal-safety) + - [_Spatial safety_](#spatial-safety) + - [_Type safety_](#type-safety) + - [_Initialization safety_](#initialization-safety) + - [_Data-race safety_](#data-race-safety) + - [_Memory safety bug_](#memory-safety-bug) + - [_Memory-safe platform_ or _environment_](#memory-safe-platform-or-environment) + - [_Memory-safe language_](#memory-safe-language) + + + +## Core Terminology + +### _Hazard_ + +Unsafe coding construct that may lead to a bug or vulnerability. For example, +indexing an array with a user-supplied and unvalidated index is a hazard. + +### _Bug_ or _defect_ + +Reachable program behavior contrary to the author's intent. + +#### _Active bug_ + +Buggy behavior that is actively occurring for users of the program. + +#### _Latent bug_ + +Buggy behavior that does not currently occur for users, but is reachable. +Behaviors that are reachable, and so _can_ happen, but don't happen today in +practice _are always still bugs_! + +### _Safety_ + +Absent a qualifier or narrow context, refers to _[system safety]_, and _[safety +engineering]_. +Always a property of a system or product as a whole, including human factors, +etc. + +[system safety]: https://en.wikipedia.org/wiki/System_safety +[safety engineering]: https://en.wikipedia.org/wiki/Safety_engineering + +### _Code_, _software_, or _program safety_ + +Invariants or limits on program behavior in the face of bugs. + +- Very narrow and specific meaning. +- Often necessary but not sufficient for system safety. + +This is a specific subset of [safety](#safety) concerns, and the ones we are +most often focused on with programming language and library design. + +### _Safety bugs_ + +Bugs where some aspect of program behavior has insufficient (often none) +invariants or limits. + +For example, _undefined behavior_ definitionally has no invariant or limit, and +reaching it is always a safety bug. + +### _Initial bug_ + +The first behavior contrary to the author's intent, distinct from subsequent +deviations. + +### _Fail-stop_ + +The behavior of immediately terminating the program, minimizing any further +business logic. This is in contrast to any form of "correct" program +termination, continuing execution, or unwinding. + +## Vulnerability Terminology + +### _Vulnerability_ or _security vulnerability_ + +A bug that creates the possibility for a malicious actor to subvert a program's +intended behavior in a way that violates a security policy (for example, +confidentiality, integrity, availability). Vulnerabilities are often exploitable +manifestations of underlying bugs. + +### _Vulnerability defense_ + +The set of strategies and techniques employed to reduce the risks posed by +vulnerabilities arising from bugs. These strategies operate at different levels +and have varying degrees of effectiveness. + +#### _Detecting_ + +While still leaving the code vulnerable, a defense that attempts to recognize +and potentially track when a specific bug has occurred dynamically. Requires +_some_ invariant or limit, but very minimal. + +#### _Mitigating_ + +Making a vulnerability significantly more expensive, difficult, or improbable to +be exploited. + +#### _Preventing_ vulnerabilities + +Making it impossible for a bug to be exploited as a vulnerability without +resolving the underlying bug -- the program still doesn't behave as intended, it +just cannot be exploited. Often this is done by defining behavior to +[fail-stop](#fail-stop). + +#### _Ensuring_ correctness + +Ensures that if the program compiles successfully, it behaves as intended. This +typically prevents a bug being written and compiled into a program in the first +place. For example, statically typed languages typically ensure that the types +used in the program are correct. + +#### _Hardening_ + +Combinations of mitigation, prevention, and ensured correctness to reduce the +practical risk of vulnerabilities due to bugs. + +## Memory Safety Specifics + +### _Memory safety_ + +Having well-defined and predictable behavior regarding memory access, even in +the face of bugs. Memory safety encompasses several key aspects: + +#### _Temporal safety_ + +Memory accesses occur only within the valid lifetime of the intended memory +object. + +#### _Spatial safety_ + +Memory accesses remain within the intended bounds of memory regions. + +#### _Type safety_ + +Memory is accessed and interpreted according to its intended type, preventing +type confusion. + +#### _Initialization safety_ + +Memory is properly initialized before being read, avoiding the use of +uninitialized data. + +#### _Data-race safety_ + +Memory writes are synchronized with reads or writes on other threads. + +### _Memory safety bug_ + +A safety bug that violates memory safety. + +### _Memory-safe platform_ or _environment_ + +A computing platform or execution environment that provides mechanisms to +prevent memory safety bugs in programs running on it from becoming +vulnerabilities. This is a _systems_ path to achieving memory safety by +providing the well-defined and predictable behavior by way of the execution +environment. For example, a strongly sandboxed WebAssembly runtime environment +can allow a program that is itself unsafe to be executed safely + +### _Memory-safe language_ + +A programming language with sufficient defenses against memory safety bugs for +them to not be a significant source of security vulnerabilities. This requires +_preventing_ vulnerabilities or _ensuring_ correctness; mitigation is not +sufficient to provide an adequate level of memory safety. + +We identify several key requirements for a language to be memory-safe: + +- The default mode or subset of the language must provide guaranteed spatial, + temporal, type, and initialization memory safety. +- Any unsafe subset must only be needed and only be used in rare, exceptional + cases. Any use of the unsafe subset must also be well delineated and + auditable. +- Currently, security evidence doesn't _require_ providing guaranteed + data-race safety for + [data-race bugs that are not _also_ temporal memory safety bugs](/docs/design/safety#data-races-versus-unsynchronized-temporal-safety). + However, the temporal memory safety guarantee must still hold. diff --git a/docs/project/faq.md b/docs/project/faq.md index 8498e64fce3fc..b10470105ef9d 100644 --- a/docs/project/faq.md +++ b/docs/project/faq.md @@ -458,12 +458,7 @@ space. ### How will Carbon achieve memory safety? -See [memory safety in the project README](/README.md#memory-safety). - -References: - -- [Lifetime annotations for C++](https://discourse.llvm.org/t/rfc-lifetime-annotations-for-c/61377) -- [Carbon principle: Safety strategy](principles/safety_strategy.md) +See our [safety design](/docs/design/safety). ### How will language version upgrades work? diff --git a/docs/project/principles/README.md b/docs/project/principles/README.md index f1e1b4e83a71c..173afd5f88859 100644 --- a/docs/project/principles/README.md +++ b/docs/project/principles/README.md @@ -25,6 +25,5 @@ wants to pursue, as well as those we want to exclude. - [Information accumulation](information_accumulation.md) - [Low context-sensitivity](low_context_sensitivity.md) - [Prefer providing only one way to do a given thing](one_way.md) -- [Safety strategy](safety_strategy.md) - [One static open extension mechanism](static_open_extension.md) - [Success criteria](success_criteria.md) diff --git a/proposals/p5914.md b/proposals/p5914.md new file mode 100644 index 0000000000000..dfbf44f85c165 --- /dev/null +++ b/proposals/p5914.md @@ -0,0 +1,280 @@ +# Updating Carbon's safety strategy + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/5914) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) +- [Goals and requirements](#goals-and-requirements) +- [Proposal](#proposal) +- [Direction for temporal and data-race memory-safe type system model](#direction-for-temporal-and-data-race-memory-safe-type-system-model) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) + - [Defer beginning the concrete safety design](#defer-beginning-the-concrete-safety-design) + - [Pursue an alternative strategy towards memory safety specifically](#pursue-an-alternative-strategy-towards-memory-safety-specifically) + - [Adopt a memory safety strategy more closely based on Rust](#adopt-a-memory-safety-strategy-more-closely-based-on-rust) + + + +## Abstract + +Carbon is accelerating and adjusting its safety strategy, specifically to flesh +out its memory safety strategy and reflect simplifying developments in the +safety space. + +This proposal replaces the previous directional safety strategy with a new +concrete and updated framework for the safety design. It includes a specific +framework for memory safety, simplified build modes, specific "safety modes", +and terminology. + +This proposal also provides a _directional_ suggestion for temporal and +data-race safety specifically. + +In addition to fully building out the above directional component, there are +several other aspects of our safety design that will follow in subsequent +proposals. The hope is to establish the initial framework here. + +## Problem + +Carbon's safety strategy pre-dates the increased importance and urgency of +having a memory safety strategy, and our subsequent efforts to accelerate this +part of Carbon's design. It also was written at a time with deep uncertainty +about the performance costs of both safety and hardening efforts. And it left +major areas as future work that are now needed such as a specific model for +delineating when Carbon code has memory safety enforced. + +We need an updated, more concrete, and more precise strategy now that we're +beginning to directly develop safety designs. + +## Background + +- [Basic Concepts and Taxonomy of Dependable and Secure Computing](https://doi.org/10.1109%2FTDSC.2004.2) +- [Secure by Design: Google's Perspective on Memory Safety](https://research.google/pubs/secure-by-design-googles-perspective-on-memory-safety/) +- [Story-time: C++, bounds checking, performance, and compilers](https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/) + +## Goals and requirements + +Our end goal for Carbon's memory safety strategy is to allow large-scale, +existing C++ codebases to _incrementally and scalably_ begin writing new code in +a memory-safe language without loss of their existing legacy codebase. Handling +real-world C++ codebases also means handling large-scale dependencies on C++, +but also on C and even Rust. + +We plan to support a two step migration process: + +1. Highly automated, minimal supervision migration from C++ to a dialect of + Carbon designed for C++ interop and migration. +2. Incremental refactoring of the Carbon code to adopt memory safe designs, + patterns, and APIs. + +From this framing of our problem statement and end-goal, we can extract detailed +requirements on the memory safety design in Carbon: + +- The ability to write Carbon as a "rigorously memory safe programming + language" (defined in the "Secure by Design" paper). + - Requires temporal, spatial, and type errors to be rejected at compile + time or detected at runtime with fail-stop behavior. + - Requires initialization errors to not have undefined behavior, but + allows, for example, zero-initialization rather than fail-stop behavior. + - This does not require complete elimination of data races, but does + require + [addressing temporal memory safety even in the face of concurrency](/docs/design/safety#data-races-versus-unsynchronized-temporal-safety). + - We directly refer to this as simply a + ["memory-safe language"](/docs/design/safety/terminology.md#memory-safe-language) + in Carbon. +- Seamless integration between Carbon's memory safe code and the memory unsafe + code resulting from migrating existing C++. + - Safe and unsafe code should be freely mixed, with incremental checking + of the strictly safe aspects of the code. +- A smooth, incremental path for this memory unsafe Carbon code to become more + memory safe over time. + - Must resonate with users as _significantly_ more incremental than + migrating directly to Rust. That is, the granularity of increasing + safety needs to be significantly smaller than moving an entire file from + C++ to Rust. +- Similar level of runtime spatial safety checks as Rust with similar runtime + overhead. +- No reference counting or garbage collection by default -- we want to + preserve C++ (and Rust) precise and explicit control over the lifetime of + resources. +- Minimal temporal safety runtime overhead: in aggregate and in real-world + application benchmarks, any performance overhead would need to be under 1%. + - We expect to spend 1-2% of runtime overhead achieving spatial safety, + and these would be cumulative. + - There has been strong, persistent resistance to over 2% runtime overhead + in broad contexts. + - Rust shows that this is achievable using primarily compile time + techniques. + +We also have some less critical but still _highly desirable_ goals: + +- Seamless _safe_ Rust interop -- able to reliably and consistently avoid + unsafe on the boundary with Rust code + - Should be no worse than the best Rust/C++ interop which uses extra + safety annotations for the C++ API +- Data race prevention + +Whether we fully achieve these secondary goals or not, we need to clearly +demonstrate where we end up and what any remaining gap there is towards these +goals. + +## Proposal + +See the new [proposed safety design](/docs/design/safety/README.md) and +[safety terminology](/docs/design/safety/terminology.md) for the core proposed +strategy and model. + +## Direction for temporal and data-race memory-safe type system model + +> Note: this is a _directional_ component of the proposal, and should be treated +> as provisional until dedicated proposals fully establish our model here. + +We expect Carbon to tackle temporal and data-race safety through its type +system, much as Rust and strict Swift do. However, we believe there are +substantial opportunities to select a model that: + +- provides comparable and compatible levels of safety; +- improves the ergonomics of interop with existing C++ APIs; and +- enables significantly more incremental adoption when starting from C++ code. + +However, these benefits aren't free: they come at the cost of added complexity +in the type system itself. Fundamentally, while we will strive to keep Carbon as +simple as we can within its design constraints, we expect Rust to be a simpler +language than Carbon, and for Carbon's improved interop model and incremental +adoption to always come at the cost of complexity. We also don't expect this +complexity to be something that can be fully hidden in implementation details of +standard libraries; some of it will be fundamental and visible to user code in +order to allow the greater degree of flexibility. + +There are at least five relevant places where we expect Carbon to differ from +Rust in this category: + +- Including constructs parallel to C++ constructs that are not present in + Rust. +- More precise modeling using more pointer types, to allow _safe_ and + _ergonomic_ mutable aliasing. +- More incremental enforcement of data-race safety. +- Aligning concepts and idioms with C++. +- Having fewer assumptions about unsafe code. + +First, Carbon is going to include language constructs to match C++ that Rust +avoids completely, and this comes at a complexity cost. A canonical example is +inheritance: Carbon will have inheritance and virtual dispatch built into the +language, allowing straightforward C++ interop and migration for APIs that use +it. These features increase the size of the language overall, and through +interaction increase the complexity of safety features. + +The second of these is the most significant change to safety: supporting safe, +shared mutation. We think this will open up the door to significantly more +flexible code while still providing memory safety. With Rust, there are two +kinds of safe pointers: shared borrows (`&`), and mutable borrows (`&mut`). The +latter are required to be exclusive -- no other safe pointers alias the mutable +object. We expect to have more distinguished types of pointers in Carbon to be +able to model more C++ coding patterns safely, including both shared mutable +pointers, and exclusive mutable pointers -- this is where the complexity +increases. While the exact design of this needs to be carefully fleshed out, our +current idea is to build on the concepts of +[safe, shared mutability in the Ante Language](https://antelang.org/blog/safe_shared_mutability/). +Building illustrative examples of how this impacts C++ code migrating towards +memory safety is an important part of our planned deliverables for 2025. + +Third, we expect to have more incremental enforcement of data-race safety in +Carbon, similar to how +[Swift allowed code to incrementally adopt the necessary patterns and restrictions](https://www.swift.org/documentation/concurrency/) +to be data-race safe. The necessary infrastructure was provided as independently +adoptable APIs, and enforcement was then available optionally prior to Swift 6 +making this a requirement. + +Fourth, we will express Carbon's core concepts around mutation in terms of C++ +idioms and concepts. For example, in a container, we expect iterators can use +shared mutable pointers, and the C++ concept of "iterator invalidation" will be +captured by methods that require exclusive mutable access, precluding any +outstanding shared mutable pointers. + +Lastly, Carbon's safety model will make fewer assumptions about what unsafe code +is allowed to do than Rust. To avoid undefined behavior, the optimization of +safe code will not rely on properties that are not established locally. For +example, there won't be an assumption that a call to C++ code won't save a copy +of pointers to passed-in objects. When there are restrictions on what unsafe +code can do, we will endeavor to make violations into +[erroneous behavior](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2795r2.html#bigpic), +where the behavior is well-defined while still considered a diagnosable +programming mistake. This reduces the risks of unsafe code during its +incremental migration to memory safe code. It also leaves unsafe Carbon code as +reasonable to work in and maintain over a longer period, which can enable more +widespread migration of C++ to unsafe Carbon. The amount of change and effort +required to convert a piece of code from unsafe Carbon to safe Carbon is much +less than the amount of change required to also change languages, introduce an +interop boundary, and make any API changes needed to enable memory safety all at +once. We believe this will significantly reduce churn effort along the +boundaries of memory safety during incremental migration from unsafe Carbon to +memory safe Carbon code as opposed to moving from C++ directly to a memory safe +language. + +Some of these differences may eventually be compelling directions for Rust to +improve its C++ interop, and we plan to develop these in active collaboration +with the Rust community. For these areas of potential overlap, Carbon largely +provides an open field for experimentation and proving out ideas. However, we +expect the majority of the differences we can achieve here would introduce +complexity and might change the nature of Rust making it difficult to retrofit. +As a consequence, there are likely to remain durable tradeoff differences +between Carbon and Rust going forward. + +## Rationale + +- [Performance-critical software](/docs/project/goals.md#performance-critical-software) + - Without memory safety, it will not be reasonably secure to continue to + write high performance software in low-level languages that provide the + desired performance control. +- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + - Our understanding of practical safety techniques, especially new details + of this uncovered over the past several years, directly form the need + for an update to our safety strategy. +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - A primary motivation and constraint across the surface of safety + features is to better enable C++ code interop and migration. + +## Alternatives considered + +### Defer beginning the concrete safety design + +The current directional strategy has served us well, and it would be technically +possible to continue to defer turning this into a concrete design. However, we +have had strong feedback from multiple interested users that they need the +safety component to be concrete to do any further evaluation of Carbon, and we +are accelerating our work to meet this need. + +### Pursue an alternative strategy towards memory safety specifically + +There are many different approaches to memory safety that we could pursue. We +haven't exhaustively listed them, however most of the major candidates are +excluded by our specific goals. For example, a GC-based strategy for solving +temporal memory safety is a non-starter for users who specifically need non-GC +memory management. Rather than exhaustively list the alternatives that are +excluded, we have tried to list the specific [goals](#goals-and-requirements) +that led to the proposed direction. + +### Adopt a memory safety strategy more closely based on Rust + +An especially appealing alternative is to base our memory safety model +_precisely_ on Rust's. It would give us a strong existence proof, a good basis +to understand soundness, etc. Many of the most difficult questions have already +been answered. This was a seriously considered direction. + +However, Rust already exists and is an excellent language. Replicating Rust is +_not_ a goal of Carbon. Our goal is to specifically address use cases where Rust +is not viable at the moment, specifically due to the need for pervasive C++ +interop or automated migration from a large, existing C++ codebase. This +directly motivates pursuing a memory safety model that attempts to further +optimize the ergonomics and incrementality of adopting memory safety when +starting in this position.