Skip to content

Commit 680509a

Browse files
authored
Add Generics as Type Classes (#249)
1 parent 22de99a commit 680509a

File tree

2 files changed

+287
-0
lines changed

2 files changed

+287
-0
lines changed

SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646

4747
- [Functional Programming](./functional/index.md)
4848
- [Programming paradigms](./functional/paradigms.md)
49+
- [Generics as Type Classes](./functional/generics-type-classes.md)
4950

5051
- [Additional Resources](./additional_resources/index.md)
5152
- [Design principles](./additional_resources/design-principles.md)

functional/generics-type-classes.md

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
# Generics as Type Classes
2+
3+
## Description
4+
5+
Rust's type system is designed more like functional languages (like Haskell)
6+
rather than imperative languages (like Java and C++). As a result, Rust can turn
7+
many kinds of programming problems into "static typing" problems. This is one
8+
of the biggest wins of choosing a functional language, and is critical to many
9+
of Rust's compile time guarantees.
10+
11+
A key part of this idea is the way generic types work. In C++ and Java, for
12+
example, generic types are a meta-programming construct for the compiler.
13+
`vector<int>` and `vector<char>` in C++ are just two different copies of the
14+
same boilerplate code for a `vector` type (known as a `template`) with two
15+
different types filled in.
16+
17+
In Rust, a generic type parameter creates what is known in functional languages
18+
as a "type class constraint", and each different parameter filled in by an end
19+
user *actually changes the type*. In other words, `Vec<isize>` and `Vec<char>`
20+
*are two different types*, which are recognized as distinct by all parts of the
21+
type system.
22+
23+
This is called **monomorphization**, where different types are created from
24+
**polymorphic** code. This special behavior requires `impl` blocks to specify
25+
generic parameters: different values for the generic type cause different types,
26+
and different types can have different `impl` blocks.
27+
28+
In object oriented languages, classes can inherit behavior from their parents.
29+
However, this allows the attachment of not only additional behavior to
30+
particular members of a type class, but extra behavior as well.
31+
32+
The nearest equivalent is the runtime polymorphism in Javascript and Python,
33+
where new members can be added to objects willy-nilly by any constructor.
34+
Unlike those languages, however, all of Rust's additional methods can be type
35+
checked when they are used, because their generics are statically defined. That
36+
makes them more usable while remaining safe.
37+
38+
## Example
39+
40+
Suppose you are designing a storage server for a series of lab machines.
41+
Because of the software involved, there are two different protocols you need
42+
to support: BOOTP (for PXE network boot), and NFS (for remote mount storage).
43+
44+
Your goal is to have one program, written in Rust, which can handle both of
45+
them. It will have protocol handlers and listen for both kinds of requests. The
46+
main application logic will then allow a lab administrator to configure storage
47+
and security controls for the actual files.
48+
49+
The requests from machines in the lab for files contain the same basic
50+
information, no matter what protocol they came from: an authentication method,
51+
and a file name to retrieve. A straightforward implementation would look
52+
something like this:
53+
54+
```rust,ignore
55+
56+
enum AuthInfo {
57+
Nfs(crate::nfs::AuthInfo),
58+
Bootp(crate::bootp::AuthInfo),
59+
}
60+
61+
struct FileDownloadRequest {
62+
file_name: PathBuf,
63+
authentication: AuthInfo,
64+
}
65+
```
66+
67+
This design might work well enough. But now suppose you needed to support
68+
adding metadata that was *protocol specific*. For example, with NFS, you
69+
wanted to determine what their mount point was in order to enforce additional
70+
security rules.
71+
72+
The way the current struct is designed leaves the protocol decision until
73+
runtime. That means any method that applies to one protocol and not the other
74+
requires the programmer to do a runtime check.
75+
76+
Here is how getting an NFS mount point would look:
77+
78+
```rust,ignore
79+
struct FileDownloadRequest {
80+
file_name: PathBuf,
81+
authentication: AuthInfo,
82+
mount_point: Option<PathBuf>,
83+
}
84+
85+
impl FileDownloadRequest {
86+
// ... other methods ...
87+
88+
/// Gets an NFS mount point if this is an NFS request. Otherwise,
89+
/// return None.
90+
pub fn mount_point(&self) -> Option<&Path> {
91+
self.mount_point.as_ref()
92+
}
93+
}
94+
```
95+
96+
Every caller of `mount_point()` must check for `None` and write code to handle
97+
it. This is true even if they know only NFS requests are ever used in a given
98+
code path!
99+
100+
It would be far more optimal to cause a compile-time error if the different
101+
request types were confused. After all, the entire path of the user's code,
102+
including what functions from the library they use, will know whether a request
103+
is an NFS request or a BOOTP request.
104+
105+
In Rust, this is actually possible! The solution is to *add a generic type* in
106+
order to split the API.
107+
108+
Here is what that looks like:
109+
110+
```rust
111+
use std::path::{Path, PathBuf};
112+
113+
mod nfs {
114+
#[derive(Clone)]
115+
pub(crate) struct AuthInfo(String); // NFS session management omitted
116+
}
117+
118+
mod bootp {
119+
pub(crate) struct AuthInfo(); // no authentication in bootp
120+
}
121+
122+
// private module, lest outside users invent their own protocol kinds!
123+
mod proto_trait {
124+
use std::path::{Path, PathBuf};
125+
use super::{bootp, nfs};
126+
127+
pub(crate) trait ProtoKind {
128+
type AuthInfo;
129+
fn auth_info(&self) -> Self::AuthInfo;
130+
}
131+
132+
pub struct Nfs {
133+
auth: nfs::AuthInfo,
134+
mount_point: PathBuf,
135+
}
136+
137+
impl Nfs {
138+
pub(crate) fn mount_point(&self) -> &Path {
139+
&self.mount_point
140+
}
141+
}
142+
143+
impl ProtoKind for Nfs {
144+
type AuthInfo = nfs::AuthInfo;
145+
fn auth_info(&self) -> Self::AuthInfo {
146+
self.auth.clone()
147+
}
148+
}
149+
150+
pub struct Bootp(); // no additional metadata
151+
152+
impl ProtoKind for Bootp {
153+
type AuthInfo = bootp::AuthInfo;
154+
fn auth_info(&self) -> Self::AuthInfo {
155+
bootp::AuthInfo()
156+
}
157+
}
158+
}
159+
160+
use proto_trait::ProtoKind; // keep internal to prevent impls
161+
pub use proto_trait::{Nfs, Bootp}; // re-export so callers can see them
162+
163+
struct FileDownloadRequest<P: ProtoKind> {
164+
file_name: PathBuf,
165+
protocol: P,
166+
}
167+
168+
// all common API parts go into a generic impl block
169+
impl<P: ProtoKind> FileDownloadRequest<P> {
170+
fn file_path(&self) -> &Path {
171+
&self.file_name
172+
}
173+
174+
fn auth_info(&self) -> P::AuthInfo {
175+
self.protocol.auth_info()
176+
}
177+
}
178+
179+
// all protocol-specific impls go into their own block
180+
impl FileDownloadRequest<Nfs> {
181+
fn mount_point(&self) -> &Path {
182+
self.protocol.mount_point()
183+
}
184+
}
185+
186+
fn main() {
187+
// your code here
188+
}
189+
```
190+
191+
With this approach, if the user were to make a mistake and use the wrong
192+
type;
193+
194+
```rust,ignore
195+
fn main() {
196+
let mut socket = crate::bootp::listen()?;
197+
while let Some(request) = socket.next_request()? {
198+
match request.mount_point().as_ref()
199+
"/secure" => socket.send("Access denied"),
200+
_ => {} // continue on...
201+
}
202+
// Rest of the code here
203+
}
204+
}
205+
```
206+
207+
They would get a syntax error. The type `FileDownloadRequest<Bootp>` does not
208+
implement `mount_point()`, only the type `FileDownloadRequest<Nfs>` does. And
209+
that is created by the NFS module, not the BOOTP module of course!
210+
211+
## Advantages
212+
213+
First, it allows fields that are common to multiple states to be de-duplicated.
214+
By making the non-shared fields generic, they are implemented once.
215+
216+
Second, it makes the `impl` blocks easier to read, because they are broken down
217+
by state. Methods common to all states are typed once in one block, and methods
218+
unique to one state are in a separate block.
219+
220+
Both of these mean there are fewer lines of code, and they are better organized.
221+
222+
## Disadvantages
223+
224+
This currently increases the size of the binary, due to the way monomorphization
225+
is implemented in the compiler. Hopefully the implementation will be able to
226+
improve in the future.
227+
228+
## Alternatives
229+
230+
* If a type seems to need a "split API" due to construction or partial
231+
initialization, consider the
232+
[Builder Pattern](../patterns/creational/builder.md) instead.
233+
234+
* If the API between types does not change -- only the behavior does -- then
235+
the [Strategy Pattern](../patterns/behavioural/strategy.md) is better used
236+
instead.
237+
238+
## See also
239+
240+
This pattern is used throughout the standard library:
241+
242+
* `Vec<u8>` can be cast from a String, unlike every other type of `Vec<T>`.[^1]
243+
* They can also be cast into a binary heap, but only if they contain a type
244+
that implements the `Ord` trait.[^2]
245+
* The `to_string` method was specialized for `Cow` only of type `str`.[^3]
246+
247+
It is also used by several popular crates to allow API flexibility:
248+
249+
* The `embedded-hal` ecosystem used for embedded devices makes extensive use of
250+
this pattern. For example, it allows statically verifying the configuration of
251+
device registers used to control embedded pins. When a pin is put into a mode,
252+
it returns a `Pin<MODE>` struct, whose generic determines the functions
253+
usable in that mode, which are not on the `Pin` itself. [^4]
254+
255+
* The `hyper` HTTP client library uses this to expose rich APIs for different
256+
pluggable requests. Clients with different connectors have different methods
257+
on them as well as different trait implementations, while a core set of
258+
methods apply to any connector. [^5]
259+
260+
* The "type state" pattern -- where an object gains and loses API based on an
261+
internal state or invariant -- is implemented in Rust using the same basic
262+
concept, and a slightly different techinque. [^6]
263+
264+
[^1]: See: [impl From\<CString\> for Vec\<u8\>](
265+
https://doc.rust-lang.org/stable/src/std/ffi/c_str.rs.html#799-801)
266+
267+
[^2]: See: [impl\<T\> From\<Vec\<T, Global\>\> for BinaryHeap\<T\>](
268+
https://doc.rust-lang.org/stable/src/alloc/collections/binary_heap.rs.html#1345-1354)
269+
270+
[^3]: See: [impl\<'_\> ToString for Cow\<'_, str>](
271+
https://doc.rust-lang.org/stable/src/alloc/string.rs.html#2235-2240)
272+
273+
[^4]: Example:
274+
[https://docs.rs/stm32f30x-hal/0.1.0/stm32f30x_hal/gpio/gpioa/struct.PA0.html](
275+
https://docs.rs/stm32f30x-hal/0.1.0/stm32f30x_hal/gpio/gpioa/struct.PA0.html)
276+
277+
[^5]: See:
278+
[https://docs.rs/hyper/0.14.5/hyper/client/struct.Client.html](
279+
https://docs.rs/hyper/0.14.5/hyper/client/struct.Client.html)
280+
281+
[^6]: See:
282+
[The Case for the Type State Pattern](
283+
https://web.archive.org/web/20210325065112/https://www.novatec-gmbh.de/en/blog/the-case-for-the-typestate-pattern-the-typestate-pattern-itself/)
284+
and
285+
[Rusty Typestate Series (an extensive thesis)](
286+
https://web.archive.org/web/20210328164854/https://rustype.github.io/notes/notes/rust-typestate-series/rust-typestate-index)

0 commit comments

Comments
 (0)