diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 1dca1f14df59..e09226bd216c 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -437,6 +437,10 @@ - [Semantic Confusion](idiomatic/leveraging-the-type-system/newtype-pattern/semantic-confusion.md) - [Parse, Don't Validate](idiomatic/leveraging-the-type-system/newtype-pattern/parse-don-t-validate.md) - [Is It Encapsulated?](idiomatic/leveraging-the-type-system/newtype-pattern/is-it-encapsulated.md) + - [Typestate Pattern](idiomatic/leveraging-the-type-system/typestate-pattern.md) + - [Typestate Pattern Example](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-example.md) + - [Beyond Simple Typestate](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-advanced.md) + - [Typestate Pattern with Generics](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics.md) --- diff --git a/src/idiomatic/leveraging-the-type-system/typestate-pattern.md b/src/idiomatic/leveraging-the-type-system/typestate-pattern.md new file mode 100644 index 000000000000..bfe864528a5b --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/typestate-pattern.md @@ -0,0 +1,89 @@ +--- +minutes: 30 +--- + +## Typestate Pattern: Problem + +How can we ensure that only valid operations are allowed on a value based on its +current state? + +```rust,editable +use std::fmt::Write as _; + +#[derive(Default)] +struct Serializer { + output: String, +} + +impl Serializer { + fn serialize_struct_start(&mut self, name: &str) { + let _ = writeln!(&mut self.output, "{name} {{"); + } + + fn serialize_struct_field(&mut self, key: &str, value: &str) { + let _ = writeln!(&mut self.output, " {key}={value};"); + } + + fn serialize_struct_end(&mut self) { + self.output.push_str("}\n"); + } + + fn finish(self) -> String { + self.output + } +} + +fn main() { + let mut serializer = Serializer::default(); + serializer.serialize_struct_start("User"); + serializer.serialize_struct_field("id", "42"); + serializer.serialize_struct_field("name", "Alice"); + + // serializer.serialize_struct_end(); // ← Oops! Forgotten + + println!("{}", serializer.finish()); +} +``` + +
+ +- This `Serializer` is meant to write a structured value. The expected usage + follows this sequence: + +```bob +serialize struct start +-+--------------------- + | + +--> serialize struct field + -+--------------------- + | + +--> serialize struct field + -+--------------------- + | + +--> serialize struct end +``` + +- However, in this example we forgot to call `serialize_struct_end()` before + `finish()`. As a result, the serialized output is incomplete or syntactically + incorrect. + +- One approach to fix this would be to track internal state manually, and return + a `Result` from methods like `serialize_struct_field()` or `finish()` if the + current state is invalid. + +- But this has downsides: + + - It is easy to get wrong as an implementer. Rust’s type system cannot help + enforce the correctness of our state transitions. + + - It also adds unnecessary burden on the user, who must handle `Result` values + for operations that are misused in source code rather than at runtime. + +- A better solution is to model the valid state transitions directly in the type + system. + + In the next slide, we will apply the **typestate pattern** to enforce correct + usage at compile time and make it impossible to call incompatible methods or + forget to do a required action. + +
diff --git a/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-advanced.md b/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-advanced.md new file mode 100644 index 000000000000..fd10ef5e163f --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-advanced.md @@ -0,0 +1,94 @@ +## Beyond Simple Typestate + +How do we manage increasingly complex configuration flows with many possible +states and transitions, while still preventing incompatible operations? + +```rust +struct Serializer {/* [...] */} +struct SerializeStruct {/* [...] */} +struct SerializeStructProperty {/* [...] */} +struct SerializeList {/* [...] */} + +impl Serializer { + // TODO, implement: + // + // fn serialize_struct(self, name: &str) -> SerializeStruct + // fn finish(self) -> String +} + +impl SerializeStruct { + // TODO, implement: + // + // fn serialize_property(mut self, name: &str) -> SerializeStructProperty + + // TODO, + // How should we finish this struct? This depends on where it appears: + // - At the root level: return `Serializer` + // - As a property inside another struct: return `SerializeStruct` + // - As a value inside a list: return `SerializeList` + // + // fn finish(self) -> ??? +} + +impl SerializeStructProperty { + // TODO, implement: + // + // fn serialize_string(self, value: &str) -> SerializeStruct + // fn serialize_struct(self, name: &str) -> SerializeStruct + // fn serialize_list(self) -> SerializeList + // fn finish(self) -> SerializeStruct +} + +impl SerializeList { + // TODO, implement: + // + // fn serialize_string(mut self, value: &str) -> Self + // fn serialize_struct(mut self, value: &str) -> SerializeStruct + // fn serialize_list(mut self) -> SerializeList + + // TODO: + // Like `SerializeStruct::finish`, the return type depends on nesting. + // + // fn finish(mut self) -> ??? +} +``` + +
+ +- Building on our previous serializer, we now want to support **nested + structures** and **lists**. + +- However, this introduces both **duplication** and **structural complexity**. + +- Even more critically, we now hit a **type system limitation**: we cannot + cleanly express what `finish()` should return without duplicating variants for + every nesting context (e.g. root, struct, list). + +- To better understand this limitation, let’s map the valid transitions: + +```bob + +-----------+ +---------+------------+-----+ + | | | | | | + V | V | V | + + | +serializer --> structure --> property --> list +-+ + + | | ^ | ^ + V | | | | + | +-----------+ | + String | | + +--------------------------+ +``` + +- From this diagram, we can observe: + - The transitions are recursive + - The return types depend on _where_ a substructure or list appears + - Each context requires a return path to its parent + +- With only concrete types, this becomes unmanageable. Our current approach + leads to an explosion of types and manual wiring. + +- In the next chapter, we’ll see how **generics** let us model recursive flows + with less boilerplate, while still enforcing valid operations at compile time. + +
diff --git a/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-example.md b/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-example.md new file mode 100644 index 000000000000..9a22e3ab83fb --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-example.md @@ -0,0 +1,99 @@ +## Typestate Pattern: Example + +The typestate pattern encodes part of a value’s runtime state into its type. +This allows us to prevent invalid or inapplicable operations at compile time. + +```rust,editable +use std::fmt::Write as _; + +#[derive(Default)] +struct Serializer { + output: String, +} + +struct SerializeStruct { + serializer: Serializer, +} + +impl Serializer { + fn serialize_struct(mut self, name: &str) -> SerializeStruct { + writeln!(&mut self.output, "{name} {{").unwrap(); + SerializeStruct { serializer: self } + } + + fn finish(self) -> String { + self.output + } +} + +impl SerializeStruct { + fn serialize_field(mut self, key: &str, value: &str) -> Self { + writeln!(&mut self.serializer.output, " {key}={value};").unwrap(); + self + } + + fn finish_struct(mut self) -> Serializer { + self.serializer.output.push_str("}\n"); + self.serializer + } +} + +fn main() { + let serializer = Serializer::default() + .serialize_struct("User") + .serialize_field("id", "42") + .serialize_field("name", "Alice") + .finish_struct(); + + println!("{}", serializer.finish()); +} +``` + +
+ +- This example is inspired by Serde’s + [`Serializer` trait](https://docs.rs/serde/latest/serde/ser/trait.Serializer.html). + Serde uses typestates internally to ensure serialization follows a valid + structure. For more, see: + +- The key idea behind typestate is that state transitions happen by consuming a + value and producing a new one. At each step, only operations valid for that + state are available. + +```bob ++------------+ serialize struct +-----------------+ +| Serializer | ------------------> | SerializeStruct | <------+ ++------------+ +-----------------+ | + | + | ^ | | | + | | finish struct | | serialize field | + | +-----------------------------+ +------------------+ + | + +---> finish +``` + +- In this example: + + - We begin with a `Serializer`, which only allows us to start serializing a + struct. + + - Once we call `.serialize_struct(...)`, ownership moves into a + `SerializeStruct` value. From that point on, we can only call methods + related to serializing struct fields. + + - The original `Serializer` is no longer accessible — preventing us from + mixing modes (such as starting another _struct_ mid-struct) or calling + `finish()` too early. + + - Only after calling `.finish_struct()` do we receive the `Serializer` back. + At that point, the output can be finalized or reused. + +- If we forget to call `finish_struct()` and drop the `SerializeStruct` early, + the `Serializer` is also dropped. This ensures incomplete output cannot leak + into the system. + +- By contrast, if we had implemented everything on `Serializer` directly — as + seen on the previous slide, nothing would stop someone from skipping important + steps or mixing serialization flows. + +
diff --git a/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics.md b/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics.md new file mode 100644 index 000000000000..9a83957802dd --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics.md @@ -0,0 +1,268 @@ +## Typestate Pattern with Generics + +By combining typestate modeling with generics, we can express a wider range of +valid states and transitions without duplicating logic. This approach is +especially useful when the number of states grows or when multiple states share +behavior but differ in structure. + +```rust +# use std::fmt::Write as _; +# +struct Serializer { + // [...] + # indent: usize, + # buffer: String, + # state: S, +} + +struct Root; +struct Struct(S); +struct List(S); +struct Property(S); + +impl Serializer { + fn new() -> Self { + // [...] + # Self { + # indent: 0, + # buffer: String::new(), + # state: Root, + # } + } + + fn serialize_struct(mut self, name: &str) -> Serializer> { + // [...] + # writeln!(self.buffer, "{name} {{").unwrap(); + # Serializer { + # indent: self.indent + 1, + # buffer: self.buffer, + # state: Struct(self.state), + # } + } + + fn finish(self) -> String { + // [...] + # self.buffer + } +} + +impl Serializer { + fn buffer_size(&self) -> usize { + // [...] + # self.buffer.len() + } +} + +impl Serializer> { + fn serialize_property(mut self, name: &str) -> Serializer>> { + // [...] + # write!(self.buffer, "{}{name}: ", " ".repeat(self.indent * 2)).unwrap(); + # Serializer { + # indent: self.indent, + # buffer: self.buffer, + # state: Property(self.state), + # } + } + + fn finish_struct(mut self) -> Serializer { + // [...] + # self.indent -= 1; + # writeln!(self.buffer, "{}}}", " ".repeat(self.indent * 2)).unwrap(); + # Serializer { + # indent: self.indent, + # buffer: self.buffer, + # state: self.state.0, + # } + } +} + +impl Serializer>> { + fn serialize_struct(mut self, name: &str) -> Serializer>> { + // [...] + # writeln!(self.buffer, "{name} {{").unwrap(); + # Serializer { + # indent: self.indent + 1, + # buffer: self.buffer, + # state: Struct(self.state.0), + # } + } + + fn serialize_list(mut self) -> Serializer>> { + // [...] + # writeln!(self.buffer, "[").unwrap(); + # Serializer { + # indent: self.indent + 1, + # buffer: self.buffer, + # state: List(self.state.0), + # } + } + + fn serialize_string(mut self, value: &str) -> Serializer> { + // [...] + # writeln!(self.buffer, "{value},").unwrap(); + # Serializer { + # indent: self.indent, + # buffer: self.buffer, + # state: self.state.0, + # } + } +} + +impl Serializer> { + fn serialize_struct(mut self, name: &str) -> Serializer>> { + // [...] + # writeln!(self.buffer, "{}{name} {{", " ".repeat(self.indent * 2)).unwrap(); + # Serializer { + # indent: self.indent + 1, + # buffer: self.buffer, + # state: Struct(self.state), + # } + } + + fn serialize_string(mut self, value: &str) -> Self { + // [...] + # writeln!(self.buffer, "{}{value},", " ".repeat(self.indent * 2)).unwrap(); + # self + } + + fn finish_list(mut self) -> Serializer { + // [...] + # self.indent -= 1; + # writeln!(self.buffer, "{}]", " ".repeat(self.indent * 2)).unwrap(); + # Serializer { + # indent: self.indent, + # buffer: self.buffer, + # state: self.state.0, + # } + } +} + +fn main() { + # #[rustfmt::skip] + let serializer = Serializer::new() + .serialize_struct("Foo") + .serialize_property("bar") + .serialize_struct("Bar") + .serialize_property("baz") + .serialize_list() + .serialize_string("abc") + .serialize_struct("Baz") + .serialize_property("partial") + .serialize_string("def") + .serialize_property("empty") + .serialize_struct("Empty") + .finish_struct() + .finish_struct() + .finish_list() + .finish_struct() + .finish_struct(); + + # let buffer_size = serializer.buffer_size(); + let output = serializer.finish(); + + # println!("buffer size = {buffer_size}\n---"); + println!("{output}"); + + // These will all fail at compile time: + + // Serializer::new().serialize_list(); + // Serializer::new().serialize_string("foo"); + // Serializer::new().serialize_struct("Foo").serialize_string("bar"); + // Serializer::new().serialize_struct("Foo").serialize_list(); + // Serializer::new().serialize_property("foo"); +} +``` + +
+ +- The full code for this example is available + [in the playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=48b106089ca600453f3ed00a0a31af26). + +- By using generics to track the parent context, we can construct arbitrarily + nested serializers that enforce valid transitions between struct, list, and + property states. + +- This lets us build a recursive structure while preserving control over what + methods are accessible in each state. + +- Here's how the flow maps to a state machine: + +```bob + +-----------+ +---------+------------+-----+ + | | | | | | + V | V | V | + + | +serializer --> structure --> property --> list +-+ + + | | ^ | ^ + V | | | | + | +-----------+ | + String | | + +--------------------------+ +``` + +- And this is reflected directly in the types of our serializer: + +```bob + +------+ + finish | | + serialize struct V | + struct ++---------------------+ --------------> +-----------------------------+ <---------------+ +| Serializer [ Root ] | | Serializer [ Struct [ S ] ] | | ++---------------------+ <-------------- +-----------------------------+ <-----------+ | + finish struct | | + | | serialize | | | + | +----------+ property V serialize | | + | | string or | | +finish | | +-------------------------------+ struct | | + V | | Serializer [ Property [ S ] ] | ------------+ | + finish | +-------------------------------+ | + +--------+ struct | | + | String | | serialize | | + +--------+ | list V | + | finish | + | +---------------------------+ list | + +------> | Serializer [ List [ S ] ] | ----------------+ + +---------------------------+ + serialize + list or string ^ + | or finish list | + +-------------------+ +``` + +- Of course, this pattern isn't a silver bullet. It still allows issues like: + - Empty or invalid property names (which can be fixed using + [the newtype pattern](../newtype-pattern.md)) + - Duplicate property names (which could be tracked in `Struct` and handled + via `Result`) + +- If validation failures occur, we can also change method signatures to return a + `Result`, allowing recovery: + + ```rust,compile_fail + struct PropertySerializeError { + kind: PropertyError, + serializer: Serializer>, + } + + impl Serializer> { + fn serialize_property( + self, + name: &str, + ) -> Result>>, PropertySerializeError> { + /* ... */ + } + } + ``` + +- While this API is powerful, it’s not always ergonomic. Production serializers + typically favor simpler APIs and reserve the typestate pattern for enforcing + critical invariants. + +- One excellent real-world example is + [`rustls::ClientConfig`](https://docs.rs/rustls/latest/rustls/client/struct.ClientConfig.html#method.builder), + which uses typestate with generics to guide the user through safe and correct + configuration steps. + +