-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Adds extracted principles that are underlying Carbon safety #6142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # Safety | ||
|
|
||
| <!-- | ||
| Part of the Carbon Language project, under the Apache License v2.0 with LLVM | ||
| Exceptions. See /LICENSE for license information. | ||
| SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| --> | ||
|
|
||
| <!-- toc --> | ||
|
|
||
| ## Static Checking | ||
|
|
||
| For any sufficiently expressive programming language, it is | ||
| undecidable whether the execution of programs in that language | ||
| will have interesting properties. | ||
|
|
||
| *Static checking* is a compile-time method to ensure program properties through | ||
| analysis and annotations that describe intended behavior. | ||
|
|
||
| Carbon aims to provides rigorous memory safety guarantees | ||
| through static checking. This requires navigating a field of tension | ||
| between safety and expressivity. Carbon's statically checked safety | ||
| guarantees should not come at the price of losing possibility to cover | ||
| realistic programming use cases. | ||
|
|
||
| Static checking cannot fully eliminate the need for dynamic techniques | ||
| for safety (e.g. bounds checks) but it can provide constraints on | ||
| runtime program behaviors to minimize such checks. | ||
|
|
||
| ## Strict and Permissive Carbon | ||
|
|
||
| The goals of Carbon include incremental migration from C++, which does not | ||
| come withs statically checked safety guarantees. Further, | ||
| interactions with system components not written in Carbon place a limit | ||
| on safety guarantees. | ||
|
|
||
| For this reason, Carbon's design distinguishes between a *strict* and a *permissive* variant of the | ||
|
||
| language and clarifies the interaction and boundary between these variants. | ||
|
|
||
| A strict-Carbon program fragment is only accepted by the compiler | ||
| if execution is guaranteed to be entirely free of safety-related execution errors | ||
| and its behavior with respect to safety is predictable. | ||
|
||
|
|
||
| A permissive-Carbon program is accepted independent without any guarantee | ||
| whether its execution may lead to safety errors. In particular, a | ||
| permissive-Carbon program fragment can call C++ code. The | ||
| programmer does not have to give additional information to the compiler | ||
| which is necessary for checking safety. | ||
|
||
|
|
||
| ## Partial Safety and Gradual Ramp-up | ||
|
|
||
| While absence of safety-related errors is a property of a program as | ||
| a whole, the design must specify the boundary and interaction between | ||
| strict-Carbon and permissive-Carbon fragments in a way that benefits | ||
| from partial safety guarantees. | ||
|
|
||
| A strict-Carbon program fragment may safely interact with a permissive-Carbon fragment | ||
| if the conditions for safe execution are met. | ||
|
|
||
| It will be possible and necessary to *assume* such safety conditions. In permissive | ||
|
||
| code, the responsibility for safety lies with the programmer. | ||
| Carbon's safety model may support communicating part or all of the assumed | ||
| information to the compiler which can make use of it for safety checking. | ||
|
|
||
| It must further be possible to turn permissive-Carbon code into safe-Carbon by gradual adoption | ||
|
||
| of safety-related annotations. | ||
|
|
||
| ## Simplicity and Ease of Understanding | ||
|
|
||
| Annotations for static checking need to be [easy to read and write](../goals.md#code-that-is-easy-to-read-understand-and-write), and program safety must | ||
| be attainable without being forced to fully rewrite migrated C++ and | ||
| without performance cost. | ||
|
|
||
| The rules that determine whether a strict-Carbon program is accepted | ||
| should be documented and easy to understand. | ||
|
|
||
| Different build modes never change the rules that make | ||
| safe-Carbon safe (safe-Carbon is safe in any build mode). | ||
| They may affect behavior and performance characteristics of | ||
| Carbon programs as a whole. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a recurring source of misunderstanding where I may need to adapt my writing. I see "safety" as a global property (like "type soundness"), a language either has it or not, whereas your suggestion indicates that there may be multiple different safety guarantees.
I believe words like "rigorous memory safety" and "memory-safe language" suggests an overall guarantee that all memory-safety bugs are prevented. C++ does not come with any guarantee here, actually, does it help if I change to singular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to capture the terminology that we are using here:
https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/safety/terminology.md#code-software-or-program-safety
IMO, the problem with viewing "safety" as a global property like "type soundness" is two-fold. First, we expect there to be meaningful unsafe code and constructs in all realistic software. That means it won't ever really hold. For something like "type soundness" the analogous operations are instead modeled as happening "outside" of the type system, and so don't erode the global property of soundness within the type system. But that model doesn't seem to generalize well to safety as it isn't clear what the system boundary really would be to achieve this. The "safe subset" model doesn't (IMO) work very well given that these often work in abstractions over unsafe constructs.
The second issue is that we don't achieve safety entirely through static checking. Instead, we use a variety of runtime techniques as a complement. And this further blurs the line for me with where the "system" boundary is.
Altogether, this has led me to prefer the model of safety being about the presence of invariants or limits on program behavior (even in the face of bugs), which isn't a binary/global property, but something we can talk about having different degrees of. For example, we're approaching the ability to have a spatial safe implementation of C++, even though it is not fully memory safe.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not attached to the unqualified word "safety", but we need a word for the all-or-nothing property. Could you suggest what we use here?
We live in a post-Rust world. The Rust community took a standard technical definition of language safety "compiler- and language-semantics-ensured absence of untrapped errors aka undefined behavior" and elevated it into into a pragmatic, everyday use. "unsafe" now means the compiler could not ensure absence of untrapped errors, and the implicit assumption is that programmers in "unsafe" code must take care to avoid untrapped errors and are on their own.
The concept is useful but this use of language not universally accepted, and it is unlikely that it will ever be. Many people miss the technical nature of the statement, and understandably so because "safe" and "unsafe" have connotations and scope that in everyday use extend far beyond the technical meaning. "C++ is and will always be unsafe" provokes emotional reaction and is perceived as offensive by some, even though it is factual statement.
The difference between "safe" in everyday use and "safe" in technical use language safety remains problematic because the discourse has now moved well beyond researchers and language designers.
Rigorous memory safety at language-level is one-and-the-same-thing as provable absence of undefined behavior from memory accesses. Call it "execution safety" or whatever. It is a global and binary property of programs and of languages. The presence of "unsafe" code is not a problem in idiomatic Rust use because the programmer supplies the arguments and reasoning that the language-semantics and compiler-checks could not provide in // SAFETY comments. As long as the unsafe code "play by the rules", with the understanding that the rules are not checkable by the compiler but the programmers responsibility, the program is going to be safe to execute, and that is what matters.
We absolutely do need to talk about global properties/invariants though, because in everyday programming use we need to be able to articulate:
The argument for Rust being a memory safe language then looks something like:
Rust has global "safe Rust" type discipline and global invariants
AND unsafe code that operates outside the discipline but unsafe and safe code together adheres to additional rules, upholding the global invariants, which cannot be enforced by the compiler
THEREFORE program is safe to execute and will not have untrapped errors.
People have indeed proven language safety, even stronger notions of safety than "absence of untrapped errors" which involves modeling the semantics (https://dl.acm.org/doi/10.1145/3676954). This is of course technical academic work and a different notion of proof than "the program has passed checks". Yet, it shows that the language lets use write programs where global absence of untrapped errors and even stronger properties are provable.
The practical impact of memory safety is of course that stuff just works when the compiler accepts the code (and one has avoided unsafe or used it correctly). Temporal safety, specifically reference validity/integrity at all times, seems to be a global property. So we still need a way to talk about the desired state of absolute certainty that untrapped errors such as use-after-free or access of uninitialized memory are absent. Otherwise, we won't be able to have things like safe abstractions over unsafe code and remain at "everything is unsafe".