1
1
# HashSet (Open-addressing)
2
2
3
+ ## Background
4
+
3
5
Open-addressing is another approach to resolving collisions in hash tables.
4
6
5
- A hash collision is resolved by < b > probing< b >, or searching through alternative locations in
7
+ A hash collision is resolved by ** probing** - searching through alternative locations in
6
8
the array (the probe sequence) until either the target element is found, or an unused array slot is found,
7
9
which indicates that there is no such key in the table.
8
10
9
11
## Implementation Invariant
10
12
11
13
Note that the buckets are 1-indexed in the following explanation.
12
14
13
- Invariant: Probe sequence is unbroken. That is to say, given an element that is initially hashed to
15
+ Invariant: *** Probe sequence is unbroken.***
16
+
17
+ That is to say, given an element that is initially hashed to
14
18
bucket 1 (arbitrary), the probe sequence {1, 2, ..., m} generated when attempting to ` add ` /` remove ` /` find `
15
19
the element will *** never*** contain null.
16
20
@@ -33,12 +37,27 @@ encountering `null`.
33
37
We could simply look into every bucket in the sequence, but that will result in ` remove ` and ` contains ` having an O(n)
34
38
runtime complexity, defeating the purpose of hashing.
35
39
36
- TLDR: There is a need to differentiate between deleted elements, and ` nulls ` to ensure operations on the Set have an O(
37
- 1 )
38
- time complexity.
40
+ TLDR: There is a need to differentiate between deleted elements, and ` nulls ` to ensure operations on the Set have an
41
+ O(1) time complexity.
39
42
40
43
## Probing Strategies
41
44
45
+ For Open-Addressing, the hash function differs from that of Chaining, in that the number of collisions encountered
46
+ when inserting the key into the Hash Set is taken into account into determining the hash value.
47
+
48
+ In the following probe strategies, the hash function typically looks like a variation of:
49
+ <div style =" text-align : center ;" ><code >h(k, i) = (h'(k) + i) mod m</code ></div >
50
+
51
+ ` h'(k) ` would be the equivalent of a typical hash function used in a HashSet that resolves collisions by Chaining,
52
+ while an additional parameter ` i ` indicates the number of collisions so far.
53
+
54
+ Take Linear Probing with a step size of 1 as an example.
55
+ Given an element ` k ` hashed to bucket 1 initially, such that:
56
+ <div style =" text-align : center ;" ><code >h(k, 0) = 1</code ></div >
57
+
58
+ then, if there was already an element in bucket 1 resulting in a collision, then the next bucket index is determined by:
59
+ <div style =" text-align : center ;" ><code >h(k, 1) = 1 + 1 = 2</code ></div >
60
+
42
61
### Linear Probing
43
62
44
63
The probing strategy used in our implementation.
@@ -48,7 +67,8 @@ Simplest form of probing and involves linearly searching the hash table for an e
48
67
However, this method of probing can result in a phenomenon called (primary) clustering where a large run of
49
68
occupied slots builds up, which can drastically degrade the performance of add, remove and contains operations.
50
69
51
- h(k, i) = (h'(k) + i) mod m where h'(k) is an ordinary hash function
70
+ ` h(k, i) = (h'(k) + i) mod m `
71
+ where ` h'(k) ` is an ordinary hash function, and ` i ` is the number of collisions so far.
52
72
53
73
### Quadratic Probing
54
74
@@ -58,35 +78,37 @@ polynomial until an open slot is found.
58
78
This helps to avoid primary clustering of entries (like in Linear Probing), but might still result in secondary
59
79
clustering where keys that hash to the same value probe the same alternative cells when a collision occurs.
60
80
61
- h(k, i) = ( h` (k) + c1 * i + c2 * (i^2) ) mod m where c1 and c2 are arbitrary constants
81
+ ` h(k, i) = (h' (k) + c1 * i + c2 * (i^2)) mod m ` where ` c1 ` and ` c2 ` are arbitrary constants
62
82
63
83
### Double Hashing
64
84
65
85
This is a method of probing where a secondary hash function is used for probing whenever a collision occurs.
66
86
67
- If h2(k) is relatively prime to m for all k , Uniform Hashing Assumption can hold true, as all permutations of probe
68
- sequences occur in equal probability.
87
+ If ` h2(k) ` is relatively prime to ` m ` for all ` k ` , Uniform Hashing Assumption can hold true, as all permutations of
88
+ probe sequences occur in equal probability.
69
89
70
- h(k, i) = (h1(k) + i * h2(k)) mod m where h1(k) and h2(k) are two ordinary hash functions
90
+ ` h(k, i) = (h1(k) + i * h2(k)) mod m ` where ` h1(k) ` and ` h2(k) ` are two ordinary hash functions
71
91
72
92
* Source: https://courses.csail.mit.edu/6.006/fall11/lectures/lecture10.pdf *
73
93
74
94
## Complexity Analysis
75
95
76
- let α = n / m where α is the load factor of the table
96
+ let ` α = n / m ` where ` α ` is the load factor of the table
97
+
98
+ For ` n ` items, in a table of size ` m ` , assuming uniform hashing, the expected cost of an operation is:
77
99
78
- For n items, in a table of size m, assuming uniform hashing, the expected cost of an operation is:
100
+ < div style = " text-align : center ; " >< code >1/1-α</ code ></ div >
79
101
80
- < div style = " text-align : center ; " >1/1-α</ div >
102
+ e.g. if ` α ` = 90%, then ` E[#probes] ` = 10;
81
103
82
- e.g. if α = 90%, then E [ #probes ] = 10;
104
+ ## Notes
83
105
84
- ## Properties of Good Hash Functions
106
+ ### Properties of Good Hash Functions
85
107
86
108
There are two properties to measure the "goodness" of a Hash Function
87
109
88
- 1 . h(key, i) enumerates all possible buckets.
89
- - For every bucket j , there is some i such that: h(key, i) = j
110
+ 1 . ` h(key, i) ` enumerates all possible buckets.
111
+ - For every bucket ` j ` , there is some ` i ` such that: ` h(key, i) = j `
90
112
- The hash function is a permutation of {1..m}.
91
113
92
114
Linear probing satisfies the first property, because it will probe all possible buckets in the Set. I.e. if an element
0 commit comments