Skip to content

Commit 90a6cae

Browse files
committed
docs: clean up architecture guide
1 parent 63abefd commit 90a6cae

File tree

5 files changed

+180
-146
lines changed

5 files changed

+180
-146
lines changed

README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
Distributed SQL database in Rust, built from scratch as an educational project. Main features:
44

5-
* [Raft distributed consensus engine][raft] for linearizable state machine replication.
5+
* [Raft distributed consensus][raft] for linearizable state machine replication.
66

7-
* [ACID transaction engine][txn] with MVCC-based snapshot isolation.
7+
* [ACID transactions][txn] with MVCC-based snapshot isolation.
88

99
* [Pluggable storage engine][storage] with [BitCask][bitcask] and [in-memory][memory] backends.
1010

@@ -31,17 +31,17 @@ been taken where possible.
3131
[memory]: https://github.com/erikgrinaker/toydb/blob/main/src/storage/memory.rs
3232
[query]: https://github.com/erikgrinaker/toydb/blob/main/src/sql/execution/executor.rs
3333
[optimizer]: https://github.com/erikgrinaker/toydb/blob/main/src/sql/planner/optimizer.rs
34-
[sql]: https://github.com/erikgrinaker/toydb/blob/main/src/sql/mod.rs
34+
[sql]: https://github.com/erikgrinaker/toydb/blob/main/src/sql/parser.rs
3535

3636
## Documentation
3737

3838
* [Architecture guide](docs/architecture/index.md): a guided tour of toyDB's code and architecture.
3939

4040
* [SQL examples](docs/examples.md): walkthrough of toyDB's SQL features.
4141

42-
* [SQL reference](docs/sql.md): toyDB's SQL reference documentation.
42+
* [SQL reference](docs/sql.md): reference documentation for toyDB's SQL dialect.
4343

44-
* [References](docs/references.md): research material used while building toyDB.
44+
* [References](docs/references.md): research materials used while building toyDB.
4545

4646
## Usage
4747

@@ -161,9 +161,10 @@ The available workloads are:
161161

162162
For more information about workloads and parameters, run `cargo run --bin workload -- --help`.
163163

164-
Example workload results are listed below. Write performance is pretty atrocious, due to fsyncs
165-
and a lack of write batching at the Raft level. Disabling fsyncs, or using the in-memory engine,
166-
significantly improves write performance.
164+
Example workload results are listed below. Write performance is atrocious, due to
165+
[fsync](https://en.wikipedia.org/wiki/Sync_(Unix)) and a lack of write batching in the Raft layer.
166+
Disabling fsync, or using the in-memory engine, significantly improves write performance (at the
167+
expense of durability).
167168

168169
| Workload | BitCask | BitCask w/o fsync | Memory |
169170
|----------|-------------|-------------------|-------------|
@@ -181,4 +182,4 @@ library 'toydb'".
181182

182183
## Credits
183184

184-
toyDB logo is courtesy of [@jonasmerlin](https://github.com/jonasmerlin).
185+
The toyDB logo is courtesy of [@jonasmerlin](https://github.com/jonasmerlin).

docs/architecture/encoding.md

Lines changed: 34 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Key/Value Encoding
22

33
The key/value store uses binary `Vec<u8>` keys and values, so we need an encoding scheme to
4-
translate between Rust in-memory data structures and the on-disk binary data. This is provided by
4+
translate between in-memory Rust data structures and the on-disk binary data. This is provided by
55
the [`encoding`](https://github.com/erikgrinaker/toydb/tree/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding)
66
module, with separate schemes for key and value encoding.
77

@@ -15,18 +15,19 @@ data type. But we could also have chosen e.g. [JSON](https://en.wikipedia.org/wi
1515
We won't dwell on the actual binary format here, see the [Bincode specification](https://github.com/bincode-org/bincode/blob/trunk/docs/spec.md)
1616
for details.
1717

18-
To use a consistent configuration for all encoding and decoding, we provide helper functions using
19-
`bincode::config::standard()` in the [`encoding::bincode`](https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding/bincode.rs)
20-
module:
18+
To use a consistent configuration for all encoding and decoding, we provide helper functions in
19+
the [`encoding::bincode`](https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding/bincode.rs)
20+
module which use `bincode::config::standard()`.
2121

2222
https://github.com/erikgrinaker/toydb/blob/0ce1fb34349fda043cb9905135f103bceb4395b4/src/encoding/bincode.rs#L15-L27
2323

24-
Bincode uses the very common [Serde](https://serde.rs) framework for its API. toyDB also provides
25-
an `encoding::Value` helper trait for value types with automatic `encode()` and `decode()` methods:
24+
Bincode uses the very common [Serde](https://serde.rs) framework for its API. toyDB also provides an
25+
`encoding::Value` helper trait for value types which adds automatic `encode()` and `decode()`
26+
methods:
2627

2728
https://github.com/erikgrinaker/toydb/blob/b57ae6502e93ea06df00d94946a7304b7d60b977/src/encoding/mod.rs#L39-L68
2829

29-
Here's an example of how this is used to encode and decode an arbitrary `Dog` data type:
30+
Here's an example of how this can be used to encode and decode an arbitrary `Dog` data type:
3031

3132
```rust
3233
#[derive(serde::Serialize, serde::Deserialize)]
@@ -42,7 +43,7 @@ let pluto = Dog { name: "Pluto".into(), age: 4, good_boy: true };
4243
let bytes = pluto.encode();
4344
println!("{bytes:02x?}");
4445

45-
// Outputs [05, 50, 6c, 75, 74, 6f, 04, 01].
46+
// Outputs [05, 50, 6c, 75, 74, 6f, 04, 01]:
4647
//
4748
// * Length of string "Pluto": 05.
4849
// * String "Pluto": 50 6c 75 74 6f.
@@ -54,37 +55,37 @@ let pluto = Dog::decode(&bytes)?; // gives us back Pluto
5455

5556
## `Keycode` Key Encoding
5657

57-
Unlike values, keys can't just use any binary encoding like Bincode. As mentioned before, the
58-
storage engine sorts data by key to enable range scans, which will be used e.g. for SQL table scans,
59-
limited SQL index scans, Raft log scans, etc. Because of this, the encoding needs to preserve the
60-
[lexicographical order](https://en.wikipedia.org/wiki/Lexicographic_order) of the encoded values:
61-
the binary byte slices must sort in the same order as the original values.
58+
Unlike values, keys can't just use any binary encoding like Bincode. As mentioned in the storage
59+
section, the storage engine sorts data by key to enable range scans. The key encoding must therefore
60+
preserve the [lexicographical order](https://en.wikipedia.org/wiki/Lexicographic_order) of the
61+
encoded values: the binary byte slices must sort in the same order as the original values.
6262

63-
As an example of why we can't just use Bincode, let's consider two strings: "house" should be
64-
sorted before "key", alphabetically. However, Bincode encodes strings prefixed by their length, so
65-
"key" would be sorted before "house" in binary form:
63+
As an example of why we can't just use Bincode, consider the strings "house" and "key". These should
64+
be sorted in alphabetical order: "house" before "key". However, Bincode encodes strings prefixed by
65+
their length, so "key" would be sorted before "house" in binary form:
6666

6767
```
68-
03 6b 65 79 ← 3 bytes: key
69-
05 68 6f 75 73 65 ← 5 bytes: house
68+
03 6b 65 79 ← 3 bytes: key
69+
05 68 6f 75 73 65 ← 5 bytes: house
7070
```
7171

72-
For similar reasons, we can't just encode numbers in their native binary form, because the
73-
[little-endian](https://en.wikipedia.org/wiki/Endianness) representation will sometimes order very
74-
large numbers before small numbers, and the [sign bit](https://en.wikipedia.org/wiki/Sign_bit)
75-
will order positive numbers before negative numbers.
72+
For similar reasons, we can't just encode numbers in their native binary form: the
73+
[little-endian](https://en.wikipedia.org/wiki/Endianness) representation will order very large
74+
numbers before small numbers, and the [sign bit](https://en.wikipedia.org/wiki/Sign_bit) will order
75+
positive numbers before negative numbers. This would violate the ordering of natural numbers.
7676

7777
We also have to be careful with value sequences, which should be ordered element-wise. For example,
7878
the pair ("a", "xyz") should be ordered before ("ab", "cd"), so we can't just encode the strings
79-
one after the other like "axyz" and "abcd" since that would sort "abcd" first.
79+
one after the other like "axyz" and "abcd" since that would sort ("ab", "cd") first.
8080

81-
toyDB provides an encoding called "Keycode" which provides these properties, in the
82-
[`encoding::keycode`](https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding/keycode.rs)
83-
module. It is implemented as a [Serde](https://serde.rs) (de)serializer, which
84-
requires a lot of boilerplate code, but we'll just focus on the actual encoding.
81+
toyDB provides an order-preserving encoding called "Keycode" in the [`encoding::keycode`](https://github.com/erikgrinaker/toydb/blob/213e5c02b09f1a3cac6a8bbd0a81773462f367f5/src/encoding/keycode.rs)
82+
module. Like Bincode, the Keycode encoding is not self-describing: the binary data does not say what
83+
the data type is, the caller must provide a type to decode into. It only supports a handful of
84+
primitive data types, and only needs to order values of the same type.
8585

86-
Keycode only supports a handful of primary data types, and just needs to order values of the same
87-
type:
86+
Keycode is implemented as a [Serde](https://serde.rs) (de)serializer, which requires a lot of
87+
boilerplate code to satisfy the trait, but we'll just focus on the actual encoding. The encoding
88+
scheme is as follows:
8889

8990
* `bool`: `00` for `false` and `01` for `true`.
9091

@@ -113,22 +114,20 @@ type:
113114

114115
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L185-L188
115116

116-
* `Vec<T>`, `[T]`, `(T,)`: just the concatenation of the inner values.
117+
* `Vec<T>`, `[T]`, `(T,)`: the concatenation of the inner values.
117118

118119
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L295-L307
119120

120-
* `enum`: the enum variant's numerical index as a `u8`, then the inner values (if any).
121+
* `enum`: the variant's numerical index as a `u8`, then the inner values (if any).
121122

122123
https://github.com/erikgrinaker/toydb/blob/2027641004989355c2162bbd9eeefcc991d6b29b/src/encoding/keycode.rs#L223-L227
123124

124-
Decoding is just the inverse of the encoding.
125-
126125
Like `encoding::Value`, there is also an `encoding::Key` helper trait:
127126

128127
https://github.com/erikgrinaker/toydb/blob/b57ae6502e93ea06df00d94946a7304b7d60b977/src/encoding/mod.rs#L20-L37
129128

130-
We typically use enums to represent different kinds of keys. For example, if we wanted to store
131-
cars and video games, we could use:
129+
Different kinds of keys are usually represented as enums. For example, if we wanted to store cars
130+
and video games, we could use:
132131

133132
```rust
134133
#[derive(serde::Serialize, serde::Deserialize)]

0 commit comments

Comments
 (0)