Skip to content

Commit dd869c6

Browse files
committed
Add an explanation of quickly hashing onto a non-power of two range.
In Olaoluwa Osuntokun's recent protocol proposal they were using a mod in an inner loop. I wanted to suggest a normative protocol change to use the trick we use here, but to find an explanation of it I had to dig up the PR on github. After I posted about it several other developers commented that it was very interesting and they were unaware of it. I think ideally the code should be self documenting and help educate other contributors about non-obvious techniques that we use. So I've written a description of the technique with citations for future reference.
1 parent 2c2d988 commit dd869c6

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

src/cuckoocache.h

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,37 @@ class cache
206206
/** compute_hashes is convenience for not having to write out this
207207
* expression everywhere we use the hash values of an Element.
208208
*
209+
* We need to map the 32-bit input hash onto a hash bucket in a range [0, size) in a
210+
* manner which preserves as much of the hash's uniformity as possible. Ideally
211+
* this would be done by bitmasking but the size is usually not a power of two.
212+
*
213+
* The naive approach would be to use a mod -- which isn't perfectly uniform but so
214+
* long as the hash is much larger than size it is not that bad. Unfortunately,
215+
* mod/division is fairly slow on ordinary microprocessors (e.g. 90-ish cycles on
216+
* haswell, ARM doesn't even have an instruction for it.); when the divisor is a
217+
* constant the compiler will do clever tricks to turn it into a multiply+add+shift,
218+
* but size is a run-time value so the compiler can't do that here.
219+
*
220+
* One option would be to implement the same trick the compiler uses and compute the
221+
* constants for exact division based on the size, as described in "{N}-bit Unsigned
222+
* Division via {N}-bit Multiply-Add" by Arch D. Robison in 2005. But that code is
223+
* somewhat complicated and the result is still slower than other options:
224+
*
225+
* Instead we treat the 32-bit random number as a Q32 fixed-point number in the range
226+
* [0,1) and simply multiply it by the size. Then we just shift the result down by
227+
* 32-bits to get our bucket number. The results has non-uniformity the same as a
228+
* mod, but it is much faster to compute. More about this technique can be found at
229+
* http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
230+
*
231+
* The resulting non-uniformity is also more equally distributed which would be
232+
* advantageous for something like linear probing, though it shouldn't matter
233+
* one way or the other for a cuckoo table.
234+
*
235+
* The primary disadvantage of this approach is increased intermediate precision is
236+
* required but for a 32-bit random number we only need the high 32 bits of a
237+
* 32*32->64 multiply, which means the operation is reasonably fast even on a
238+
* typical 32-bit processor.
239+
*
209240
* @param e the element whose hashes will be returned
210241
* @returns std::array<uint32_t, 8> of deterministic hashes derived from e
211242
*/

0 commit comments

Comments
 (0)