Go Lookup is a custom hash table implementation in Go, and I am currently developing it for recreational purposes. I have recorded myself during the development process, feel free to check it out! 😎:
- Custom hash table implementation.
- Uses FNV-1a (Fowler–Noll–Vo) non-cryptographic hash function for fast lookups.
- Uses Open Addressing as a collision resolution technique
- Uses Double Hashing as a probing technique
- Uses Prime numbers for hash table sizing to reduce collisions.
The implementation uses an open-addressed hash table using a double hashing strategy that incorporates tetrahedral numbers.
In an open-addressed hash table, all key-value pairs are stored directly in a single array. I refer to each location in this array as a slot. When inserting or looking up a key, a hash function is used to determine its home slot.
For each key, we a hash is computed:
hash(key) % tableSize
This gives the initial slot where the key should be placed. If that slot is empty, the key is inserted there. However, if the slot is already occupied ( a collision occurred) then a new slot need to be searched using a probe sequence.
Double hashing is used as a probing stretegy. Rather than stepping through the array linearly or quadratically, a second hash to compute a more varied step size is used to help avoid clustering.
The first and second hash functions are calculated as follows.
hash1 = hash(key) % tableSize
hash2 = 1 + (hash(key) % (tableSize - 1))
This second hash ensures that the offset is always non-zero and coprime with the table size (which is always a prime number), allowing to eventually probe every slot in the table.
I implemented a tetrahedral offset to the double hashing function, but have not yet done rigorous benchmarks to prove its efficiency, to be honest, I just decided to add it as it provides a more "spread" out slots on a collision chain.
The tetrahedral number is defined as:
T(n) = (n³ - n) / 6
This number grows slowly at first, then more rapidly as the number of collisions increases, and is used to further spread out probe indices in a non-linear way.
The final probe location is calculated as:
index = (hash1 + c * hash2 + T(c)) % tableSize
Where c
is the number of collisions encountered so far while probing.
This approach improves dispersion, especially for keys that would otherwise follow similar probe patterns, further reducing the chance of clustering.
Suppose we have a hash table with 17 slots and want to insert a key. Let’s say hash(key) = 12345678
.
We calculate:
hash1 = 12345678 % 17 = 4
hash2 = 1 + (12345678 % 16) = 15
We then try to insert:
- Attempt 0: (4 + 0×15 + T(0)) % 17 = 4
- Attempt 1: (4 + 1×15 + T(1)) % 17 = (4 + 15 + 0) % 17 = 19 % 17 = 2
- Attempt 2: (4 + 2×15 + T(2)) % 17 = (4 + 30 + 1) % 17 = 35 % 17 = 1
- Attempt 3: (4 + 3×15 + T(3)) % 17 = (4 + 45 + 4) % 17 = 53 % 17 = 2 → already tried
- Continue probing...
If a slot is taken, the collision count is incremented and a new index is calculated until an empty slot is found.
Searching and deletion follows the exact same probe sequence used during insertion. If a key is being searched:
- Start at the computed
hash1
. - If the slot doesn't match the key, we go to the next probe location using the same formula.
- Continue until the key is found or hit an empty slot (which means the key is not present).
This hash table keeps track of how full it is. As more elements are inserted, the average number of collisions and probe length increases. To maintain performance, it resizes the backing array when the load factor reaches 60% (this is currently an arbitrary chocie)
When resizing:
- The new size is the next appropriate prime number larger than twice the current size.
- Every key in the old table is reinserted using the new hash parameters.
- If the new size exceeds Go's
uint64
panic is raised.
Run all tests:
go test .
Run all tests and benchmarks:
go test -bench . -benchmem
Benchmark a specific test:
go test -bench=<test> -benchmem
Benchmark multiple tests (regex):
go test -bench='<test>|<test>' -benchmem
Useful flags:
- Set iterations/time:
-benchtime 2s
or-benchtime 100x
- Run specific test only:
go test -run
Format code:
go fmt .
- CPU: Intel® Core™ i7-8750H @ 2.20GHz
- Arch: amd64
- OS: Windows
Keys | Runs | ns/op | ns/op per key | B/op | Allocs/op |
---|---|---|---|---|---|
1,000 | 100 | 11,288 | 11.29 | 0 | 0 |
50,000 | 100 | 639,210 | 12.78 | 0 | 0 |
1,000,000 | 100 | 12,491,896 | 12.49 | 0 | 0 |
Keys | Runs | ns/op | ns/op per key | B/op | Allocs/op |
---|---|---|---|---|---|
1,000 | 100 | 30,330 | 30.33 | 0 | 0 |
50,000 | 100 | 1,452,023 | 29.04 | 0 | 0 |
1,000,000 | 100 | 29,547,213 | 29.55 | 0 | 0 |
Key length: 36 characters
Runs | ns/op | B/op | Allocs/op | ns/insertion |
---|---|---|---|---|
500 | 232,270,992 | 8,398,035 | 1,013,744 | 232.27 |
- Go Language Reference: Go Spec
- Go Testing Package: testing docs, testing flags
- FNV Hash Function: Wikipedia
- Prime Numbers: Wikipedia
- Open Addressing: Wikipedia
- Double Hashing: Wikipedia