Skip to content

Conversation

hoxxep
Copy link
Contributor

@hoxxep hoxxep commented Aug 4, 2025

Hello,

I've recently released rapidhash v3.0.0. It uses the same integer hashing as foldhash, but comprehensively beats the bytes/string hashing performance on all platforms we've benchmarked (common server chips and M1 Max). It offers the same minimal DoS resistance and portability too.

Aside from marginally increased code size and a higher MSRV (rapidhash=1.77, foldhash=1.60) I believe there are no other downsides to changing the default to rapidhash.

If the MSRV is an issue, I can take a look at reducing it? It's only this high because the hashing algorithm is fully const, and efficiently reading/splitting slices in const is difficult with the older versions.

Full benchmarks are available in the README and docs folder. Very happy to answer any questions or hear feedback on rapidhash. Thank you!

@Amanieu
Copy link
Member

Amanieu commented Aug 4, 2025

I'm always happy to switch to a better hash function. However I would like to give @orlp (author of foldhash) an opportunity to review this first.

@hoxxep
Copy link
Contributor Author

hoxxep commented Aug 4, 2025

Absolutely, I'm very grateful to @orlp for the work he's done! The rapidhash v3.0 release is effectively foldhash's integer hashing approach, but replaces the Hasher::write with a carefully tuned rapidhash-derived alternative that appears to beat foldhash on almost all string hashing benchmarks.

It took me months to match foldhash's performance. There were many small subtleties fighting against the inliner, reducing the hashmap size, faster global seeding, the integer buffer, avoiding register clobbering, and keeping the avalanching logic in Hasher::finish as small and simple as possible. A lot of credit goes to the novel ideas in foldhash and how well it works with the rust hashing traits.

@orlp Any feedback on rapidhash itself, or how to roll it out so that people can benefit from it would be great too. I think it's strictly an upgrade on foldhash, but would happily take any feedback or critique. Also happy to chat offline.

@orlp
Copy link

orlp commented Aug 4, 2025

It took me months to match foldhash's performance. There were many small subtleties fighting against the inliner, reducing the hashmap size, faster global seeding, the integer buffer, avoiding register clobbering, and keeping the avalanching logic in Hasher::finish as small and simple as possible. A lot of credit goes to the novel ideas in foldhash and how well it works with the rust hashing traits.

I took a quick look at the rapidhash crate and with all due respect, the Hasher construction is essentially just foldhash. I can indeed see in the codebase that you did a lot of experiments to try out different things, but the end result as benchmarked is just the same construction. And the benchmark results reflect that, on any non-string input rapidhash is identical (from your benchmark reports, Xeon 8488C):

┌────────────────┬────────────┬─────────────┬────────────┐
│          distr ┆      bench ┆ rapidhash-f ┆ foldhash-f │
│            --- ┆        --- ┆         --- ┆        --- │
│            strstrf64f64 │
╞════════════════╪════════════╪═════════════╪════════════╡
│            u32 ┆   hashonly ┆        0.660.66 │ <-- these are just the same
│        u32pair ┆   hashonly ┆        0.790.66 │ |
│            u64 ┆   hashonly ┆        0.790.79 │ |
│      u64lobits ┆   hashonly ┆        0.660.66 │ |
│      u64hibits ┆   hashonly ┆        0.660.66 │ |
│        u64pair ┆   hashonly ┆        0.740.79 │ |
│           ipv4 ┆   hashonly ┆        0.660.66 │ |
│           ipv6 ┆   hashonly ┆        0.750.74 │ |
│           rgba ┆   hashonly ┆        0.660.66 │ |
│      accesslog ┆   hashonly ┆        1.321.31 │ |
╞════════════════╪════════════╪═════════════╪════════════╡ string benchmarks after this point
│ strenglishword ┆   hashonly ┆        1.753.54 │ <-- short string perf is impressive
│        struuid ┆   hashonly ┆        2.925.04 │ | 
│         strurl ┆   hashonly ┆        4.767.07 │ |
│        strdate ┆   hashonly ┆        1.683.26 │ |
│       kilobyte ┆   hashonly ┆       28.3731.12 │ <-- roughly same for medium-to-long
│    tenkilobyte ┆   hashonly ┆      320.48372.53 │ <-- better again at long inputs, due to
└────────────────┴────────────┴─────────────┴────────────┘ more unrolling, questionable if desired for codesize

Thus, the only meaningful differences are found in the string hashing routine.

And even there, the actual implementation differences are minor (albeit the results are impressive). To quantify this, we both essentially try to feed as many bytes into the following expression as quickly as possible:

state = folded_multiply(input_data0 ^ state, input_data1 ^ seed);

My <= 16 byte string handling is also essentially the same as rapidhash as far as I can see, so the main differences are in how much we unroll and structure the loops to feed this expression in the medium-sized and larger loops.

I think thus with a relatively small PR to foldhash I can match these results for string hashing. I would love to work together to do further experiments and improve (especially) the short-string performance of foldhash if you're interested, but I can't in good faith recommend to just replace my crate entirely, especially when therapidhash crate is so closely derived from foldhash (despite your best intentions and hard work).

While my above post is undoubtedly not what you'd want to hear, I'd like to reiterate that the improvement for short strings is impressive, and you did good work.

@hoxxep
Copy link
Contributor Author

hoxxep commented Aug 5, 2025

Thus, the only meaningful differences are found in the string hashing routine.

I agree this is what it's become, and appreciate that you understand it wasn't the intention! Rapidhash is derived from wyhash, which as you've said revolves around the same folded multiply foldhash is based on. The rusty parts of rapidhash have morphed into foldhash, and the byte hashing part is a derivation of the C++ rapidhash, obsessively tweaked to play nice with inlining and small string sizes. Although putting it all together has taken months of work and tweaking to get it right.

My <= 16 byte string handling is also essentially the same as rapidhash as far as I can see

I was surprised at how similar foldhash and rapidhash are for small strings given they were both released around the same time. I couldn't tell if yourself and Nicolas had come up with it independently, or had already taken inspiration from each other?

more unrolling, questionable if desired for codesize

Long string hashing is marked as cold and inline never, so hopefully the increase is minimal on the final binary. I haven't compared it, but the 17..288 input length range is what I was more concerned about increasing too much.

so the main differences are in how much we unroll and structure the loops

Some other deviations from foldhash that might contribute to the performance difference:

  • The RapidHasher struct is smaller (48 bytes vs 64), as we don't copy the secrets and instead pass a &'static [u64; 7] into the RapidHasher. By comparison, instantiating FoldHasher copies its secrets, and I haven't checked how often that's optimised out.
  • The medium string 17..288 hashing has been carefully tweaked to minimise the setup/teardown.
  • Both the medium and long byte hashing functions are marked #[cold], with long inputs also marked #[inline(never)].

The rapidhash crate also offers portable hashing with full compatibility to the various C++ versions, equivalent streaming versions consuming std::io::Read, and a CLI. I was also considering adding support for portable-hash if that experiment proves useful. All this to say I don't think it makes sense to drop the rapidhash crate in its entirety given it's broader goals, but I had already been meaning to add yourself into the acknowledgements because of the novel ideas to optimise around rust's Hash and Hasher trait that are borrowed from foldhash.

I would love to work together to do further experiments and improve (especially) the short-string performance of foldhash if you're interested

Absolutely, and I'd be keen to learn from you too. I'll take a look at submitting a foldhash PR this week to match the small string hashing performance. If that works we can consider if it's worth improving the long string hashing, as it would need more secrets to use the rapidhash implementation. I was also considering experimenting with some tweaks to rustc-hash afterwards, but I think your optimisations there are already close to optimal for the small strings workload!

@orlp
Copy link

orlp commented Aug 5, 2025

I was surprised at how similar foldhash and rapidhash are for small strings given they were both released around the same time. I couldn't tell if yourself and Nicolas had come up with it independently, or had already taken inspiration from each other?

The b[0], b[len - 1], b[len / 2] trick to handle 1-3 bytes has been around the block in hash functions for a while now. I don't know who used it first, and it has probably been discovered multiple times independently. I certainly didn't come up with it first, but neither did Nicolas. It was already in wyhash for example.

Similarly, doing overlapping reads of [0, w) and [len-w, w) for some width w is a fairly standard hashing trick at this point to handle sizes w through 2*w.

So with the 1-3 bytes trick and the above with w = 4 and w = 8 that's the input reading handled for sizes up to 16, and from there it's just a matter of feeding it into the folded multiply. I didn't look at rapidhash while building foldhash and I wouldn't be surprised if they didn't look at foldhash either and it's just convergent evolution.

@orlp
Copy link

orlp commented Aug 9, 2025

A quick status update: with some changes to inlining the largest gaps between rapidhash and foldhash are already closed. I'm still tweaking the string hash a bit further but expect a release probably somewhere next weekend.

@orlp
Copy link

orlp commented Aug 23, 2025

I've released foldhash 0.2.0 which should have closed the gap.

@hoxxep
Copy link
Contributor Author

hoxxep commented Aug 26, 2025

Agreed foldhash v0.2.0 mostly closes the gap. Comparing to rapidhash v4.0.0 there's a marginal improvement on [16,48] length inputs, but it's only around 15% faster.

image

On the foldhash benchmark suite, the main differences are uuid and tenkilobyte inputs. Everything else is fairly equal.

realworld/StrUuid/hashonly-struuid-rapidhash-f                                                                             
                        time:   [2.4554 ns 2.4633 ns 2.4727 ns]
realworld/StrUuid/hashonly-struuid-foldhash-fast                                                                             
                        time:   [3.0297 ns 3.0361 ns 3.0439 ns]
realworld/StrUuid/lookupmiss-struuid-rapidhash-f                                                                             
                        time:   [5.5069 ns 5.5526 ns 5.5984 ns]
realworld/StrUuid/lookupmiss-struuid-foldhash-fast                                                                             
                        time:   [5.9742 ns 6.0190 ns 6.0639 ns]
realworld/StrUuid/lookuphit-struuid-rapidhash-f                                                                             
                        time:   [8.5505 ns 8.5899 ns 8.6297 ns]
realworld/StrUuid/lookuphit-struuid-foldhash-fast                                                                             
                        time:   [9.3978 ns 9.4323 ns 9.4674 ns]
realworld/StrUuid/setbuild-struuid-rapidhash-f                                                                            
                        time:   [126.76 µs 127.10 µs 127.48 µs]
realworld/StrUuid/setbuild-struuid-foldhash-fast                                                                            
                        time:   [135.98 µs 136.28 µs 136.60 µs]

@hoxxep
Copy link
Contributor Author

hoxxep commented Aug 26, 2025

I've added a PR #641 to bump foldhash to v0.2.0 since the performance is so similar.

The thumbv6m test fails here because RandomState::default() is not implemented for types where it wouldn't give a random result. If required, I can use compile-time randomness instead and add this back in via v4.0.1.

As for MSRV, rapidhash reduced its MSRV to 1.71.0 with v4.0.0, while hashbrown is 1.65.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants