Skip to content
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,19 +250,19 @@ We plan to support a two step migration process:

1. Highly automated, minimal supervision migration from C++ to a dialect of
Carbon designed for C++ interop and migration.
2. Incremental refactoring of the Carbon code to adopt memory safe designs,
2. Incremental refactoring of the Carbon code to adopt memory-safe designs,
patterns, and APIs.

We also want to address important, low-hanging fruit in the safety space
immediately when migrating into Carbon:

- Tracking uninitialized states better, increased enforcement of
initialization, and systematically providing hardening against
initialization bugs when desired.
- Designing fundamental APIs and idioms to support dynamic bounds checks in
debug and hardened builds.
- Having a default debug build mode that is both cheaper and more
comprehensive than existing C++ build modes even when combined with
initialization, and hardening against initialization bugs when needed.
- Designing fundamental APIs and idioms to support dynamic bounds checking.
- Switching from undefined behavior to erroneous behavior wherever possible,
and marking the remaining undefined behavior with visible `unsafe` syntax.
- Having a default debug build mode that has less runtime overhead while being
more comprehensive than existing C++ debug build modes combined with
[Address Sanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer).

For more details, see our [safety design](/docs/design/safety).
Expand Down
136 changes: 107 additions & 29 deletions docs/design/safety/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Data races versus unsynchronized temporal safety](#data-races-versus-unsynchronized-temporal-safety)
- [Safe library ecosystem](#safe-library-ecosystem)
- [Build modes](#build-modes)
- [Constraints](#constraints)
- [Probabilistic techniques likely cannot stop attacks](#probabilistic-techniques-likely-cannot-stop-attacks)
- [References](#references)

<!-- tocstop -->
Expand All @@ -34,6 +36,7 @@ of code safety such as avoiding undefined behavior in other forms.
[code safety]:
/docs/design/safety/terminology.md#code-software-or-program-safety
[systems safety]: /docs/design/safety/terminology.md#safety
[memory safety]: /docs/design/safety/terminology.md#memory-safety

However, Carbon also has a goal of fine grained, smooth interop-with and
migration-from existing C++ code, and C++ is a fundamentally unsafe language. It
Expand All @@ -42,19 +45,19 @@ guarantees. Our safety strategy has to address how C++ code fits into it, and
provide an incremental path from where the code is at today towards increasing
levels of safety.

Ultimately, Carbon will both provide a [memory safe language], _and_ provide a
language that can interop with C++ and be targeted for mechanical migration from
C++.
Ultimately, Carbon will both provide a [memory-safe language], _and_ provide a language
that is a target for mechanical migration from C++ and optimizes even further for
interop with unsafe C++ with minimal friction.

[memory safe language]: /docs/design/safety/terminology.md#memory-safe-language
[memory-safe language]: /docs/design/safety/terminology.md#memory-safe-language

## Safe and unsafe code

Carbon will have both _safe_ and _unsafe_ code. Safe code provides limits on the
potential behavior of the program even in the face of bugs in order to prevent
[safety bugs] from becoming [vulnerabilities]. Unsafe code is any code or operation
which may lack limits or guarantees on behavior, and as a result may result in undefined
behavior and a safety bug.
which lacks limits or guarantees on behavior, and as a consequence may have undefined
behavior or be a safety bug.

[safety bugs]: /docs/design/safety/terminology.md#safety-bugs
[vulnerabilities]:
Expand All @@ -73,8 +76,19 @@ language:
whatever syntax is used so that this keyword can be used as a common
annotation in audits and review.

The result is that Carbon does not have a "safe" mode and an "unsafe" mode, but
rather narrow and specific unsafe operations and constructs.
The result is that we don't model large regions or sections of Carbon code as
"unsafe" or have a parallel "unsafe" variant language. We instead focus on the
narrow and specific unsafe operations and constructs.

Note that when we're talking about the narrow semantics of an unsafe capability,
these are the semantics of the specific unsafe operation. For example, an unsafe
type conversion shouldn't also allow unsafe out-of-lifetime access. This is
separate from the _soundness_ implications of an unsafe operation, which may not
be as easily narrowed.

> **Future work**: More fully expand on the soundness principles and model for
> safe Carbon code. This is an interesting and important area of the design that
> isn't currently fleshed out in detail.

## Safety modes

Expand All @@ -86,11 +100,29 @@ keyword. This mode is designed to satisfy the requirements of a [memory
safe language].

_Permissive Carbon_ is a mode optimized for interop with C++ and automated
migration from C++. In this mode, some unsafe code to not require an `unsafe`
migration from C++. In this mode, some unsafe code does not require an `unsafe`
keyword: specific aspects of C++ interop or pervasive patterns that occur when
migrating from C++. However, not _all_ unsafe code will omit the keyword, the
permissive mode is designed to be minimal in the unsafe code allowed.

Modes can be configured on an individual file as part of the package
declaration, or an a function body as part of the function definition. More
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or an a function body

"an" -> "on"? or "in"?

options such as regions of declarations or regions of statements can be explored
in the future based on demand in practice when working with mixed-strictness
code. More fine grained than statements is not expected given that the same core
expressivity is available at that finer granularity through explicitly marking
`unsafe` operations.

> **Future work**: Define the exact syntax for package declaration and function
> definition control of strictness. The syntax for enabling the permissive mode
> must include the `unsafe` keyword as it is standing in place of more granular
> use of the `unsafe` keywords and needs to still be discovered when auditing
> for safety.

> **Future work**: Define how to mark `import`s of permissive libraries in
> various contexts, balancing ergonomic burden against the potential for
> surprising amounts af unsafety in dependencies.

## Memory safety model

Carbon will use a hybrid of different techniques to achieve memory safety in its
Expand Down Expand Up @@ -123,10 +155,13 @@ how they approach both temporal and data-race safety.

Carbon has the option of distinguishing between two similar but importantly
different classes of bugs: data races and unsynchronized temporal safety
violations. Specifically, there evidence from security teams that the second of
these has been a greater source of exploitation than first. As a consequence,
Carbon has some flexibility while still being a [memory safe language] according
to our definition:
violations. Specifically, there is no evidence from security teams that there is
any significant volume of vulnerabilities that involve a data race bug but don't
also involve a temporal memory safety violation. For example, despite both Go
and non-strict-concurrency Swift only providing temporal safety, the rate of
memory safety vulnerabilities in software written in both matches the expected
low rate for memory-safe languages. As a consequence, Carbon has some
flexibility while still being a [memory-safe language] according to our definition:

- Carbon might choose to _not_ prevent data race bugs that are not
_themselves_ also temporal safety bugs, even though the data race may lead
Expand All @@ -137,25 +172,36 @@ to our definition:
synchronization directly allows temporal safety bugs, such as use after
free.

However, preventing these data race bugs remains _highly valuable_ for
correctness, debugging, and achieving [fearless concurrency]. But it is one aspect
where Carbon can in theory afford a compromise based on the current security information.
Despite having this flexibility, preventing data race bugs remains _highly
valuable_ for correctness, debugging, and achieving [fearless concurrency]. If Carbon
can, it should work to prevent data races as well.

[fearless concurrency]: https://doc.rust-lang.org/book/ch16-00-concurrency.html

## Safe library ecosystem

Carbon will need access to a memory safe library ecosystem. The industry is
currently investing a massive amount of effort to build out a sufficient
ecosystem of such software using Rust, and it is critical that Carbon does not
impede, slow down, or require duplicating that ecosystem.
Carbon will need access to memory-safe library ecosystems, both for
general-purpose, multi-platform functionality and for platform-specific
functionality. Currently, the industry is currently investing a massive amount
of effort to build out a sufficient ecosystem of general, multi-platform
software using Rust, and it is critical that Carbon does not impede, slow down,
or require duplicating that ecosystem. Similarly, if any other cross-platform
library ecosystems emerge in any viable memory-safe languages given our
performance constraints, we should work to reuse them and avoid duplication.

**Carbon's strategy for safe and generally reusable cross-platform libraries is
to leverage Rust libraries through interop.** This is a major motivating reason
for seamless and safe Rust interop. The Carbon project will work to avoid
creating duplication between the growing Rust library ecosystem and any future
Carbon library ecosystem. Carbon's ecosystem will be focused instead on
libraries and functionality that would either be missing or only available in
C++.

**Carbon's strategy for safe and generally reusable libraries is to leverage
Rust libraries through interop.** This is a major motivating reason for seamless
and safe Rust interop. The Carbon project will work to avoid creating
duplication between the growing Rust library ecosystem and any future Carbon
library ecosystem. Carbon's ecosystem will be focused instead on libraries and
functionality that would either be missing or only available in C++.
Platform-specific functionality is typically developed in that platform's native
language, whether that is Swift for Apple platforms or Kotlin for Android.
Again, the goal of Carbon should be to avoid duplicating functionality, and
instead to prioritize high quality interop with the existing platform libraries
in the relevant languages on those platforms.

## Build modes

Expand All @@ -180,8 +226,9 @@ that disables the run-time enforcement, enabling the control of any overhead inc
[evidence]: https://chandlerc.blog/posts/2024/11/story-time-bounds-checking/
[performance control]: /docs/project/goals.md#performance-critical-software

The debug build will work to cause bugs and any detectable undefined behavior to
have [fail-stop] behavior and even detailed diagnostics to enable better
The debug build will change the actual behavior of both safe and unsafe code
with detectable bugs or detectable undefined or erroneous behavior to have
[fail-stop] behavior and provide detailed diagnostics to enable better
debugging. This mode will at least provide similar bug [detection] capabilities
to [AddressSanitizer] and [MemorySanitizer].

Expand Down Expand Up @@ -209,9 +256,40 @@ release build for that specific purpose. For example:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html

Carbon will provide a way to disable the default hardening in release builds,
but not in a supported way. It's use is expected to be strictly for benchmarking
but not in a supported way. Its use is expected to be strictly for benchmarking
purposes.

## Constraints

### Probabilistic techniques likely cannot stop attacks

It's expected that non-cryptographic probabilistic techniques that can be
applied at the language level are attackable through a variety of techniques:

- The attacker might be able to attack repeatedly until it gets through.
- The attacker may be able to determine when the attack would be detected and
only run the attack when it would not be.
- The attacker might be able control the test condition to make detection much
less likely or avoid detection completely. For example, if detection is
based on the last 4 bits of a memory address, an attacker may be able to
generate memory allocations, viewing the address and only attacking when
there's a collision.

Hardware vulnerabilities may make these attacks easier than they might otherwise
appear. Future hardware vulnerabilities are difficult to predict. In general, we
do not expect non-cryptographic probabilistic techniques to be an effective
approach to achieving safety.

While _cryptographic_ probabilistic techniques can, and typically do, work
carefully to not be subject to these weaknesses, they face a very different
challenge. The overhead of a cryptographically secure hash is generally
prohibitive for use in language level constructs. Further, some of the defenses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

language level -> language-level

against hardware vulnerabilities and improvements further exacerbate these
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete "and improvements"

overheads. However, when these can be applied usefully such as with [PKeys],
they are robust.

[PKeys]: https://docs.kernel.org/core-api/protection-keys.html

## References

- Proposal
Expand Down
Loading
Loading