@@ -7,16 +7,17 @@ the array (the probe sequence) until either the target element is found, or an u
7
7
which indicates that there is no such key in the table.
8
8
9
9
## Implementation Invariant
10
+
10
11
Note that the buckets are 1-indexed in the following explanation.
11
12
12
- Invariant: Probe sequence is unbroken. That is to say, given an element that is initially hashed to
13
+ Invariant: Probe sequence is unbroken. That is to say, given an element that is initially hashed to
13
14
bucket 1 (arbitrary), the probe sequence {1, 2, ..., m} generated when attempting to ` add ` /` remove ` /` find `
14
15
the element will *** never*** contain null.
15
16
16
17
This invariant is used to help us ensure the correctness and efficiency of ` add ` /` remove ` /` contains ` .
17
- With the above example of an element generating a probe sequence {1, 2, ...}, ` add ` will check each bucket
18
- sequentially, attempting to add the element, treating buckets containing ` Tombstones ` (to be explained later) and
19
- ` nulls ` as ** empty** buckets available for insertion.
18
+ With the above example of an element generating a probe sequence {1, 2, ...}, ` add ` will check each bucket
19
+ sequentially, attempting to add the element, treating buckets containing ` Tombstones ` (to be explained later) and
20
+ ` nulls ` as ** empty** buckets available for insertion.
20
21
21
22
As a result, if the bucket is inserted in bucket ` m ` , such that the probe sequence {1, 2, ..., m} is
22
23
generated, then there must have been elements occupying buckets {1, 2, ..., m - 1}, resulting in collisions.
@@ -26,13 +27,14 @@ simply replacing the element to be removed with `null` will cause `contains` to
26
27
was present.
27
28
28
29
` Tombstones ` allow us to mark the bucket as deleted, which allows ` contains ` to know that there is a
29
- possibility that the targeted element can be found later in the probe sequence, returning false immediately upon
30
+ possibility that the targeted element can be found later in the probe sequence, returning false immediately upon
30
31
encountering ` null ` .
31
32
32
33
We could simply look into every bucket in the sequence, but that will result in ` remove ` and ` contains ` having an O(n)
33
34
runtime complexity, defeating the purpose of hashing.
34
35
35
- TLDR: There is a need to differentiate between deleted elements, and ` nulls ` to ensure operations on the Set have an O(1)
36
+ TLDR: There is a need to differentiate between deleted elements, and ` nulls ` to ensure operations on the Set have an O(
37
+ 1 )
36
38
time complexity.
37
39
38
40
## Probing Strategies
@@ -80,10 +82,12 @@ For n items, in a table of size m, assuming uniform hashing, the expected cost o
80
82
e.g. if α = 90%, then E[ #probes] = 10;
81
83
82
84
## Properties of Good Hash Functions
85
+
83
86
There are two properties to measure the "goodness" of a Hash Function
87
+
84
88
1 . h(key, i) enumerates all possible buckets.
85
- - For every bucket j, there is some i such that: h(key, i) = j
86
- - The hash function is a permutation of {1..m}.
89
+ - For every bucket j, there is some i such that: h(key, i) = j
90
+ - The hash function is a permutation of {1..m}.
87
91
88
92
Linear probing satisfies the first property, because it will probe all possible buckets in the Set. I.e. if an element
89
93
is initially hashed to bucket 1, in a Set with capacity n, linear probing generates a sequence of {1, 2, ..., n - 1, n},
@@ -96,6 +100,6 @@ enumerating every single bucket.
96
100
- Linear Probing does *** NOT*** fulfil UHA. In linear probing, when a collision occurs, the HashSet handles it by
97
101
checking the next bucket, linearly until an empty bucket is found. The next slot is always determined in a fixed
98
102
linear manner.
99
- - In practicality, achieving UHA is difficult. Double hashing can come close to achieving UHA, by using another
103
+ - In practicality, achieving UHA is difficult. Double hashing can come close to achieving UHA, by using another
100
104
hash function to vary the step size (unlike linear probe where the step size is constant), resulting in a more
101
105
uniform distribution of keys and better performance for the hash table.
0 commit comments