|
| 1 | ++++ |
| 2 | +title = "Rust Hashing Cheat Sheet" |
| 3 | +description = "Several examples of how to use Rust's hashing traits and types" |
| 4 | + |
| 5 | +[taxonomies] |
| 6 | +tags = ["rust"] |
| 7 | ++++ |
| 8 | + |
| 9 | +Hashing is the process of transforming arbitrary data into a fixed-size number. Several useful programming concepts arise out of hash codes: |
| 10 | + |
| 11 | +- Hash sets and maps |
| 12 | +- Data digests |
| 13 | +- Cheap identifiers / inequality checks |
| 14 | +- Storing passwords |
| 15 | + |
| 16 | +Recently I tried writing a hash set from scratch in [Rust](https://rust-lang.org/) for educational purposes, but was awfully confused by the collection of traits and types provided by [`std::hash`](https://doc.rust-lang.org/stable/std/hash/index.html). In this post I hope to share some common patterns related to hashing in Rust, while explaining `std::hash` as I go. |
| 17 | + |
| 18 | +## Hashing a Single Value |
| 19 | + |
| 20 | +Hashing a value is as simple as creating a `Hasher`, calling `value.hash(&mut hasher)`, and then calling `hasher.finish()`. |
| 21 | + |
| 22 | +```rust |
| 23 | +use std::hash::{DefaultHasher, Hash, Hasher}; |
| 24 | + |
| 25 | +let mut hasher = DefaultHasher::new(); |
| 26 | +"Hello, world!".hash(&mut hasher); |
| 27 | +let hash: u64 = hasher.finish(); |
| 28 | + |
| 29 | +println!("Hash: {hash}"); // Hash: 7092736762612737980 |
| 30 | +``` |
| 31 | + |
| 32 | +## Hashing Several Values into One Code |
| 33 | + |
| 34 | +You can call `value.hash(&mut hasher)` several times to create a hash code composed of multiple data sources. This is useful when hashing structs or arrays. |
| 35 | + |
| 36 | +```rust |
| 37 | +use std::hash::{DefaultHasher, Hash, Hasher}; |
| 38 | + |
| 39 | +let mut hasher = DefaultHasher::new(); |
| 40 | + |
| 41 | +"Hello".hash(&mut hasher); |
| 42 | +13u64.hash(&mut hasher); |
| 43 | +false.hash(&mut hasher); |
| 44 | + |
| 45 | +let hash: u64 = hasher.finish(); |
| 46 | + |
| 47 | +println!("Hash: {hash}"); // Hash: 3402450879032501501 |
| 48 | +``` |
| 49 | + |
| 50 | +## `Hash`, `Hasher`, and `DefaultHasher` |
| 51 | + |
| 52 | +- [`Hash`](https://doc.rust-lang.org/stable/std/hash/trait.Hash.html): a type that can be hashed (`str`, `u64`, `bool`, etc.) |
| 53 | +- [`Hasher`](https://doc.rust-lang.org/stable/std/hash/trait.Hasher.html): a hashing algorithm (`DefaultHasher`, 3rd-party implementations) |
| 54 | +- [`DefaultHasher`](https://doc.rust-lang.org/stable/std/hash/struct.DefaultHasher.html): Rust's default hashing algorithm[^siphash] |
| 55 | + |
| 56 | +`Hasher`s are never re-used to make several hash codes. If you want to compute a new hash code, you discard the current `Hasher` and create a new one. |
| 57 | + |
| 58 | +[^siphash]: In Rust 1.91.0 the default hashing algorithm is [SipHash 1-3](https://en.wikipedia.org/wiki/SipHash), but this is an internal detail that may change in the future. |
| 59 | + |
| 60 | +## Hashing with a Random Seed |
| 61 | + |
| 62 | +To make a hash resilient to [hash flooding](https://en.wikipedia.org/wiki/Collision_attack#Hash_flooding), you can create a `Hasher` with a random seed using `RandomState`. |
| 63 | + |
| 64 | +```rust |
| 65 | +use std::hash::{BuildHasher, Hash, Hasher, RandomState}; |
| 66 | + |
| 67 | +let state = RandomState::new(); |
| 68 | + |
| 69 | +let mut hasher = state.build_hasher(); |
| 70 | +"Hello, world!".hash(&mut hasher); |
| 71 | +let hash = hasher.finish(); |
| 72 | + |
| 73 | +println!("Hash: {hash}"); // Hash: 1905042730872565693 |
| 74 | +``` |
| 75 | + |
| 76 | +There's also a shorthand for this pattern using [`BuildHasher::hash_one()`](https://doc.rust-lang.org/stable/std/hash/trait.BuildHasher.html#method.hash_one). |
| 77 | + |
| 78 | +```rust |
| 79 | +use std::hash::{BuildHasher, RandomState}; |
| 80 | + |
| 81 | +let state = RandomState::new(); |
| 82 | + |
| 83 | +let hash = state.hash_one("Hello, world!"); |
| 84 | + |
| 85 | +println!("Hash: {hash}"); // Hash: 11506452463443521132 |
| 86 | +``` |
| 87 | + |
| 88 | +Note how the hash codes are different from the two examples, even though they're both hashing `"Hello, world!"`, because `RandomState::new()` creates a new random seed each time it is called. |
| 89 | + |
| 90 | +## `BuildHasher` and `RandomSeed` |
| 91 | + |
| 92 | +- [`BuildHasher`](https://doc.rust-lang.org/stable/std/hash/trait.BuildHasher.html): a type that can create a new `Hasher` with a seed |
| 93 | +- [`RandomState`](https://doc.rust-lang.org/stable/std/hash/struct.RandomState.html): generates a random seed when constructed, then builds `Hasher`s using that seed |
| 94 | + |
| 95 | +If you want to hash two separate values and compare them for equality, you would typically create one `RandomState` then use it to build two `DefaultHasher`s with the same seed.[^default-hasher-new] |
| 96 | + |
| 97 | +[^default-hasher-new]: Technically, you could just create the two hashers by calling `DefaultHasher::new()`, which initializes them with a seed of 0. This is vulnerable to [hash flooding](https://en.wikipedia.org/wiki/Collision_attack#Hash_flooding) attacks, however, so I don't recommend it! |
| 98 | + |
| 99 | +## Deriving `Hash` |
| 100 | + |
| 101 | +The easiest way to make a custom type hashable is by deriving `Hash`. |
| 102 | + |
| 103 | +```rust |
| 104 | +use std::hash::{DefaultHasher, Hash, Hasher}; |
| 105 | + |
| 106 | +#[derive(Hash)] |
| 107 | +struct Foo { |
| 108 | + a: &'static str, |
| 109 | + b: u64, |
| 110 | + c: bool, |
| 111 | +} |
| 112 | + |
| 113 | +let mut hasher = DefaultHasher::new(); |
| 114 | + |
| 115 | +Foo { a: "Hello", b: 13, c: false }.hash(&mut hasher); |
| 116 | + |
| 117 | +let hash: u64 = hasher.finish(); |
| 118 | + |
| 119 | +println!("Hash: {hash}"); // Hash: 3402450879032501501 |
| 120 | +``` |
| 121 | + |
| 122 | +## Implementing `Hash` Manually |
| 123 | + |
| 124 | +If you look closely, you'll notice that the hash codes from [Hashing Several Values into One Code](#hashing-several-values-into-one-code) and [Deriving `Hash`](#deriving-hash) are equal! This is because they're hashing the same data in the same order with the same seed. To prove this, we can expand the `Hash` derivation: |
| 125 | + |
| 126 | +```rust |
| 127 | +use std::hash::{Hash, Hasher}; |
| 128 | + |
| 129 | +struct Foo { |
| 130 | + a: &'static str, |
| 131 | + b: u64, |
| 132 | + c: bool, |
| 133 | +} |
| 134 | + |
| 135 | +impl Hash for Foo { |
| 136 | + fn hash<H: Hasher>(&self, state: &mut H) { |
| 137 | + self.a.hash(state); |
| 138 | + self.b.hash(state); |
| 139 | + self.c.hash(state); |
| 140 | + } |
| 141 | +} |
| 142 | +``` |
| 143 | + |
| 144 | +## Conclusion |
| 145 | + |
| 146 | +I hope these examples help you wrap your head around Rust's hashing support! While I didn't cover it in this article, you may be also interested in [`Hasher`](https://doc.rust-lang.org/stable/std/hash/trait.Hasher.html)'s methods and how primitives like [`bool`](https://doc.rust-lang.org/stable/std/hash/trait.Hash.html#impl-Hash-for-bool), [`char`](https://doc.rust-lang.org/stable/std/hash/trait.Hash.html#impl-Hash-for-char), and [tuples](https://doc.rust-lang.org/stable/std/hash/trait.Hash.html#impl-Hash-for-(T,)) implement `Hash`. You may also enjoy looking at [`rustc-hash`](https://lib.rs/crates/rustc-hash) (previous `fxhash`), [`fnv`](https://lib.rs/crates/fnv), [`sha2`](https://lib.rs/crates/sha2), and [`blake2`](https://lib.rs/crates/blake2). |
| 147 | + |
| 148 | +Happy hacking! |
0 commit comments