Skip to content

Commit 2bbdb39

Browse files
Adjustments to Rust design decisions wording.
PiperOrigin-RevId: 816717751
1 parent ae0c43c commit 2bbdb39

File tree

1 file changed

+42
-26
lines changed

1 file changed

+42
-26
lines changed

content/reference/rust/rust-design-decisions.md

Lines changed: 42 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@ the byte array across the boundary, and deserialize in the other language. This
3030
also reduces binary size for these use cases by avoiding having redundant schema
3131
information embedded in the binary for the same messages for each language.
3232

33+
Google sees Rust as an opportunity to incrementally get memory safety to key
34+
portions of preexisting brownfield C++ servers; the cost of serialization at the
35+
language boundaries would prevent adoption of Rust to replace C++ in many of
36+
these important and performance-sensitive cases. If we pursued a greenfield Rust
37+
Protobuf implementation that did not have this support, it would end up blocking
38+
Rust adoption and require that these important cases stay on C++ instead.
39+
3340
Protobuf Rust currently supports three kernels:
3441

3542
* C++ kernel - the generated code is backed by C++ Protocol Buffers (the
@@ -47,9 +54,12 @@ Protobuf Rust currently supports three kernels:
4754
other languages. This is the default in open source builds where we expect
4855
static linking with code already using C++ Protobuf to be more rare.
4956

50-
The decision to support multiple non-Rust kernels significantly influences our
51-
public API decisions, including the types used on getters (discussed later in
52-
this document).
57+
Rust Protobuf is designed to support multiple alternate implementations
58+
(including multiple different memory layouts) while exposing exactly the same
59+
API, allowing for the same application code to be recompiled targeting being
60+
backed by a different implementation. This design constraint significantly
61+
influences our public API decisions, including the types used on getters
62+
(discussed later in this document).
5363

5464
### No Pure Rust Kernel {#no-pure-rust}
5565

@@ -61,18 +71,22 @@ While Rust being a memory-safe language can significantly reduce exposure to
6171
critical security issues, no language is immune to security issues. The Protobuf
6272
implementations that we support as kernels have been scrutinized and fuzzed to
6373
the extent that Google is comfortable using those implementations to perform
64-
unsandboxed parsing of untrusted inputs in our own servers and apps. A
65-
greenfield binary parser written in Rust at this time would be understood to be
66-
much more likely to contain critical vulnerabilities than the preexisting C++
67-
Protobuf parser.
74+
unsandboxed parsing of untrusted inputs in our own servers and apps.
75+
76+
A greenfield binary parser written in Rust at this time would be understood to
77+
be much more likely to contain critical vulnerabilities than our preexisting C++
78+
Protobuf or upb parsers, which have been extensively fuzzed, tested, and
79+
reviewed.
6880

69-
There are legitimate arguments for long-term supporting a pure Rust
70-
implementation, including toolchain difficulties for developers using our
71-
implementation in open source.
81+
There are legitimate arguments for supporting a pure Rust kernel implementation
82+
long-term, including the ability for developers to avoid needing to have Clang
83+
available to compile C code at build time.
7284

73-
It is a reasonable assumption that Google will support a pure Rust
74-
implementation at some later date, but we are not investing in it today and have
75-
no concrete roadmap for it at this time.
85+
We expect that Google will support a pure Rust implementation with the same
86+
exposed API at some later date, but we have no concrete roadmap for it at this
87+
time. A second official Rust Protobuf implementation that has a 'better' API by
88+
avoiding the constraints that come from being backed by C++ Proto and upb is not
89+
planned, as we wouldn't want to fragment Google's own Protobuf usage.
7690

7791
## View/Mut Proxy Types {#view-mut-proxy-types}
7892

@@ -164,21 +178,23 @@ than Rust's std UTF-8 validation.
164178
### ProtoString {#proto-string}
165179

166180
Rust's `str` and `std::string::String` types maintain a strict invariant that
167-
they only contain valid UTF-8, but C++ Protobuf and C++'s `std::string` type
168-
generally do not enforce any such guarantee. `string` typed Protobuf fields are
169-
intended to only ever contain valid UTF-8, and C++ Protobuf uses a correct and
170-
highly optimized UTF8 validator. C++ Protobuf's API surface is not set up to
171-
strictly enforce a runtime invariant that `string` fields always contain valid
172-
UTF-8 (instead, it defers any validation to serialize or subsequent parse time).
181+
they only contain valid UTF-8, but C++'s `std::string` type does not enforce any
182+
such guarantee. `string` typed Protobuf fields are intended to only ever contain
183+
valid UTF-8, and C++ Protobuf does use a correct and highly optimized UTF8
184+
validator. However, C++ Protobuf's API surface is not set up to strictly enforce
185+
as a runtime invariant that its `string` fields always contain valid UTF-8,
186+
instead, in some cases it allows setting of non-UTF8 data into a `string` field
187+
and validation will only occur at a later time when serialization is happening.
173188

174189
To enable integrating Rust into preexisting codebases that use C++ Protobuf
175-
while minimizing unnecessary validations or risk of undefined behavior in Rust,
176-
we chose not to use the `str`/`String` types for `string` field getters. We
177-
introduced the types `ProtoStr` and `ProtoString` instead, which are equivalent
178-
types, except that they may contain invalid UTF-8 in rare situations. Those
179-
types let the application code choose if they wish to perform the validation
180-
on-demand to observe the fields as a `Result<&str>`, or operate on the raw bytes
181-
to avoid any runtime validation.
190+
while allowing for zero-cost boundary crossings with no risk of undefined
191+
behavior in Rust, we unfortunately have to avoid the `str`/`String` types for
192+
`string` field getters. Instead, the types `ProtoStr` and `ProtoString` are
193+
used, which are equivalent types, except that they may contain invalid UTF-8 in
194+
rare situations. Those types let the application code choose if they wish to
195+
perform the validation on-demand to observe the fields as a `Result<&str>`, or
196+
operate on the raw bytes to avoid any runtime validation. All of the setter
197+
paths are still designed to allow you to pass `&str` or `String` types.
182198

183199
We are aware that vocabulary types like `str` are very important to idiomatic
184200
usage, and intend to keep an eye on if this decision is the right one as usage

0 commit comments

Comments
 (0)