Skip to content

Commit 19a45ef

Browse files
committed
docs: revise OA HashSet README
1 parent 15ae8e5 commit 19a45ef

File tree

1 file changed

+39
-17
lines changed
  • src/main/java/dataStructures/hashSet/openAddressing

1 file changed

+39
-17
lines changed

src/main/java/dataStructures/hashSet/openAddressing/README.md

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
# HashSet (Open-addressing)
22

3+
## Background
4+
35
Open-addressing is another approach to resolving collisions in hash tables.
46

5-
A hash collision is resolved by <b>probing<b>, or searching through alternative locations in
7+
A hash collision is resolved by **probing** - searching through alternative locations in
68
the array (the probe sequence) until either the target element is found, or an unused array slot is found,
79
which indicates that there is no such key in the table.
810

911
## Implementation Invariant
1012

1113
Note that the buckets are 1-indexed in the following explanation.
1214

13-
Invariant: Probe sequence is unbroken. That is to say, given an element that is initially hashed to
15+
Invariant: ***Probe sequence is unbroken.***
16+
17+
That is to say, given an element that is initially hashed to
1418
bucket 1 (arbitrary), the probe sequence {1, 2, ..., m} generated when attempting to `add`/`remove`/`find`
1519
the element will ***never*** contain null.
1620

@@ -33,12 +37,27 @@ encountering `null`.
3337
We could simply look into every bucket in the sequence, but that will result in `remove` and `contains` having an O(n)
3438
runtime complexity, defeating the purpose of hashing.
3539

36-
TLDR: There is a need to differentiate between deleted elements, and `nulls` to ensure operations on the Set have an O(
37-
1)
38-
time complexity.
40+
TLDR: There is a need to differentiate between deleted elements, and `nulls` to ensure operations on the Set have an
41+
O(1) time complexity.
3942

4043
## Probing Strategies
4144

45+
For Open-Addressing, the hash function differs from that of Chaining, in that the number of collisions encountered
46+
when inserting the key into the Hash Set is taken into account into determining the hash value.
47+
48+
In the following probe strategies, the hash function typically looks like a variation of:
49+
<div style="text-align: center;"><code>h(k, i) = (h'(k) + i) mod m</code></div>
50+
51+
`h'(k)` would be the equivalent of a typical hash function used in a HashSet that resolves collisions by Chaining,
52+
while an additional parameter `i` indicates the number of collisions so far.
53+
54+
Take Linear Probing with a step size of 1 as an example.
55+
Given an element `k` hashed to bucket 1 initially, such that:
56+
<div style="text-align: center;"><code>h(k, 0) = 1</code></div>
57+
58+
then, if there was already an element in bucket 1 resulting in a collision, then the next bucket index is determined by:
59+
<div style="text-align: center;"><code>h(k, 1) = 1 + 1 = 2</code></div>
60+
4261
### Linear Probing
4362

4463
The probing strategy used in our implementation.
@@ -48,7 +67,8 @@ Simplest form of probing and involves linearly searching the hash table for an e
4867
However, this method of probing can result in a phenomenon called (primary) clustering where a large run of
4968
occupied slots builds up, which can drastically degrade the performance of add, remove and contains operations.
5069

51-
h(k, i) = (h'(k) + i) mod m where h'(k) is an ordinary hash function
70+
`h(k, i) = (h'(k) + i) mod m`
71+
where `h'(k)` is an ordinary hash function, and `i` is the number of collisions so far.
5272

5373
### Quadratic Probing
5474

@@ -58,35 +78,37 @@ polynomial until an open slot is found.
5878
This helps to avoid primary clustering of entries (like in Linear Probing), but might still result in secondary
5979
clustering where keys that hash to the same value probe the same alternative cells when a collision occurs.
6080

61-
h(k, i) = ( h`(k) + c1 * i + c2 * (i^2) ) mod m where c1 and c2 are arbitrary constants
81+
`h(k, i) = (h'(k) + c1 * i + c2 * (i^2)) mod m` where `c1` and `c2` are arbitrary constants
6282

6383
### Double Hashing
6484

6585
This is a method of probing where a secondary hash function is used for probing whenever a collision occurs.
6686

67-
If h2(k) is relatively prime to m for all k, Uniform Hashing Assumption can hold true, as all permutations of probe
68-
sequences occur in equal probability.
87+
If `h2(k)` is relatively prime to `m` for all `k`, Uniform Hashing Assumption can hold true, as all permutations of
88+
probe sequences occur in equal probability.
6989

70-
h(k, i) = (h1(k) + i * h2(k)) mod m where h1(k) and h2(k) are two ordinary hash functions
90+
`h(k, i) = (h1(k) + i * h2(k)) mod m` where `h1(k)` and `h2(k)` are two ordinary hash functions
7191

7292
*Source: https://courses.csail.mit.edu/6.006/fall11/lectures/lecture10.pdf*
7393

7494
## Complexity Analysis
7595

76-
let α = n / m where α is the load factor of the table
96+
let `α = n / m` where `α` is the load factor of the table
97+
98+
For `n` items, in a table of size `m`, assuming uniform hashing, the expected cost of an operation is:
7799

78-
For n items, in a table of size m, assuming uniform hashing, the expected cost of an operation is:
100+
<div style="text-align: center;"><code>1/1-α</code></div>
79101

80-
<div style="text-align: center;">1/1-α</div>
102+
e.g. if `α` = 90%, then `E[#probes]` = 10;
81103

82-
e.g. if α = 90%, then E[#probes] = 10;
104+
## Notes
83105

84-
## Properties of Good Hash Functions
106+
### Properties of Good Hash Functions
85107

86108
There are two properties to measure the "goodness" of a Hash Function
87109

88-
1. h(key, i) enumerates all possible buckets.
89-
- For every bucket j, there is some i such that: h(key, i) = j
110+
1. `h(key, i)` enumerates all possible buckets.
111+
- For every bucket `j`, there is some `i` such that: `h(key, i) = j`
90112
- The hash function is a permutation of {1..m}.
91113

92114
Linear probing satisfies the first property, because it will probe all possible buckets in the Set. I.e. if an element

0 commit comments

Comments
 (0)