|
| 1 | +# Developer Guide |
| 2 | + |
| 3 | +This guide is meant as an entry point for developer. It both gives the |
| 4 | +philisophy behind the design of this package and some concrete details, such as |
| 5 | +invariants. |
| 6 | + |
| 7 | +## Why does this package exist? |
| 8 | + |
| 9 | +This package exists to offer a different performance/functionality |
| 10 | +trade-of vis-a-vis ordered container packages |
| 11 | +(e.g. [containers](http://hackage.haskell.org/package/containers)). Hashing-based |
| 12 | +data structures tend to be faster than comparison-based ones, at the cost of not |
| 13 | +providing operations the rely on the data being ordered. |
| 14 | + |
| 15 | +This means that this package must be faster than ordered containers, or there |
| 16 | +would be no reason for it to exist, given that its functionality is a strict |
| 17 | +subset of ordered containers. This might seem obvious, but the author has |
| 18 | +rejected several proposals in the past (e.g. to switch to higher quality but |
| 19 | +slower hash functions) that would have made unordered-containers too slow to |
| 20 | +motivate its existance. |
| 21 | + |
| 22 | +## A note on hash functions |
| 23 | + |
| 24 | +While the [hashable](http://hackage.haskell.org/package/containers) package is a |
| 25 | +separate package, it was co-designed with this package. Its main role is to |
| 26 | +support this package and not to provide good general purpose hash functions |
| 27 | +(e.g. to use when fingerprinting a text file). |
| 28 | + |
| 29 | +The hash functions used (by default) were picked to make data structures |
| 30 | +fast. The actual functions used oftens surprise developers who have learned |
| 31 | +about hashing during their studies but haven't looked at which functions are |
| 32 | +actually used in practice. |
| 33 | + |
| 34 | +For example, integers are hashed to themselves. This might seemed contrary to |
| 35 | +what you might have learned about hashing (e.g. that you need avalanche |
| 36 | +behavior; changing one bit of input changes half of the bits in the output). It |
| 37 | +turns out that this isn't what typically is done in practice (take a little tour |
| 38 | +of the various programming languages standard libraries to see this for |
| 39 | +yourself). Hashing integers to themselves is both faster (i.e. free) and the |
| 40 | +improved locality can be helpful given common input patterns. |
| 41 | + |
| 42 | +Another interesting example of hashing is string hashing, where |
| 43 | +[FNV](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) |
| 44 | +is used. FNV is a decent hash function, but hash worse properties than say |
| 45 | +[MurmurHash](https://en.wikipedia.org/wiki/MurmurHash). However, it's much |
| 46 | +faster. The fact that it's faster is not obvious given the way hash function |
| 47 | +performance is often quoted, namely by giving the average throughput on large |
| 48 | +inputs. Most inputs (e.g. keys) aren't large, often no more than 10 characters |
| 49 | +long. Hash functions typically have a start-up cost and many functions that have |
| 50 | +high throughput (such as MurmurHash) are more expensive for short strings than |
| 51 | +FNV. |
| 52 | + |
| 53 | +### Security |
| 54 | + |
| 55 | +There's an uncomfortable trade-off with regards to security threats posed by |
| 56 | +e.g. denial of service attacks. Always using more secure hash function, like |
| 57 | +[SipHash](https://en.wikipedia.org/wiki/SipHash), would provide security by |
| 58 | +default. However, those functions would make the performance of the data |
| 59 | +structures no better than that of ordered containers, which defeats the purpose |
| 60 | +of this package. |
| 61 | + |
| 62 | +The current, someone frustrating, state is that you have to know which data |
| 63 | +structures can be tampered with by users and either use SipHash just for those |
| 64 | +or switch to ordered containers that don't have collision problems. This package |
| 65 | +uses fast hash functions by default. |
0 commit comments