@@ -7,7 +7,7 @@ invariants.
7
7
## Why does this package exist?
8
8
9
9
This package exists to offer a different performance/functionality
10
- trade-of vis-a-vis ordered container packages
10
+ trade-off vis-a-vis ordered container packages
11
11
(e.g. [ containers] ( http://hackage.haskell.org/package/containers ) ). Hashing-based
12
12
data structures tend to be faster than comparison-based ones, at the cost of not
13
13
providing operations the rely on the data being ordered.
@@ -59,7 +59,50 @@ default. However, those functions would make the performance of the data
59
59
structures no better than that of ordered containers, which defeats the purpose
60
60
of this package.
61
61
62
+ Previous versions of this package tried to switch to SipHash (and a different
63
+ hash function for integers). Those changes eventually had to be rolled back
64
+ after failing to make a fast enough implementation (using SSE instructions where
65
+ possible) that also wasn't crashing on some platforms.
66
+
62
67
The current, someone frustrating, state is that you have to know which data
63
68
structures can be tampered with by users and either use SipHash just for those
64
69
or switch to ordered containers that don't have collision problems. This package
65
70
uses fast hash functions by default.
71
+
72
+ ## Data structure design
73
+
74
+ The data structures are based on the
75
+ [ hash array mapped trie (HAMT)] ( https://en.wikipedia.org/wiki/Hash_array_mapped_trie )
76
+ data structures. There are several persistent implementations of the HAMT,
77
+ including in Clojure and Scala.
78
+
79
+ The actual implementation is as follows:
80
+
81
+ ``` haskell
82
+ data HashMap k v
83
+ = Empty
84
+ | BitmapIndexed ! Bitmap ! (A. Array (HashMap k v ))
85
+ | Leaf ! Hash ! (Leaf k v )
86
+ | Full ! (A. Array (HashMap k v ))
87
+ | Collision ! Hash ! (A. Array (Leaf k v ))
88
+ ```
89
+
90
+ Here's a quick overview in order of simplicty:
91
+
92
+ * `Empty ` -- The empty map.
93
+ * `Leaf ` -- A key-value pair.
94
+ * `Collision ` -- An array of key-value pairs where the keys have identical hash
95
+ values. Element order doesn't matter.
96
+ * `Full ` -- An array of child nodes. Given a key you can find the child it is
97
+ part of by taking / B / bits of the hash value for the key and indexing into
98
+ the key. Which bits to use depends on the tree level.
99
+ * `BitmapIndexed ` -- Similar to above except that the array is implemented as a
100
+ sparse array (to avoid storing `Empty ` values). A bitmask and popcount is
101
+ used to convert from the index taken from the hash value, just like above, to
102
+ the actual index in the array. This node gets upgraded to a `Full ` node when
103
+ it contains / 2 ^ B / elements.
104
+
105
+ The number of bits of the hash value to use at each level of the tree, / B / , is a
106
+ compiled time constant (i. e. 4 ). In general a larger / B / improves lookup
107
+ performance (shallower tree) but hurts modification (large nodes to copy when
108
+ updating the spine of the tree).
0 commit comments