-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
add safer, more versatile version of rapidhash for any array and for any iterable #59185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
end | ||
|
||
@assume_effects :terminates_globally function hash_bytes( | ||
arr::AbstractArray{UInt8}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be crazy to loosen this signature?
I can imagine there would potentially be types with eltype == UInt8
and index step size 1 but not <: AbstractArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends. This is a specialization of the function for array-like containers. The iterator fallback is actually usually slightly faster than this function in general (since iterate is always much cheaper to compute than index), except when the compiler can be especially clever and realize that this accesses dense memory, and can entirely skip computing indices
I tried this and in local benchmarks it seems to be about 1.5-3x faster depending on the length
vs
|
I don't see anything like that speedup, and that's also unsound, but it does seem worth unrolling this from that result so I did that too (I realized I hadn't tried before since it did so badly when I had unrolled the array version that I assumed the iterator version would do the same). |
All the code user(s) examples now rebased on top of this PR (for all Integers, AbstractStrings, etc): https://github.com/JuliaLang/julia/pull/new/jn/rapidhash-all-the-things |
I ran into similar bootstrap problems with maybe it is worth splitting |
b438b86
to
16a70a1
Compare
16a70a1
to
3728f0b
Compare
32 bit failures look real @vtjnash:
|
…any iterable Tested that it gives the same answer on random data, and that performance is equivalent (with LLVM 20 on AArch64). ``` julia> using BenchmarkTools, Random; for n = 0:67:1000 a = (i % UInt8 for i in 1:n) b = collect(a) ## Same results for [] and String # b = codeunits(randstring(n)) # a = Base.Generator(identity, b) @Btime Base.hash_bytes($a, Base.HASH_SEED, Base.HASH_SECRET) @Btime Base.hash_bytes($b, Base.HASH_SEED, Base.HASH_SECRET) @Btime Base.hash_bytes(pointer($b), length($b), Base.HASH_SEED, Base.HASH_SECRET) println() end 5.666 ns (0 allocations: 0 bytes) # iterator 3.625 ns (0 allocations: 0 bytes) # array 3.625 ns (0 allocations: 0 bytes) # pointer 20.269 ns (0 allocations: 0 bytes) 5.375 ns (0 allocations: 0 bytes) 5.083 ns (0 allocations: 0 bytes) 35.624 ns (0 allocations: 0 bytes) 6.250 ns (0 allocations: 0 bytes) 6.208 ns (0 allocations: 0 bytes) 50.954 ns (0 allocations: 0 bytes) 7.625 ns (0 allocations: 0 bytes) 7.334 ns (0 allocations: 0 bytes) 66.496 ns (0 allocations: 0 bytes) 9.083 ns (0 allocations: 0 bytes) 8.875 ns (0 allocations: 0 bytes) 82.299 ns (0 allocations: 0 bytes) 10.719 ns (0 allocations: 0 bytes) 10.511 ns (0 allocations: 0 bytes) 98.346 ns (0 allocations: 0 bytes) 12.888 ns (0 allocations: 0 bytes) 12.513 ns (0 allocations: 0 bytes) ```
3728f0b
to
a23ff47
Compare
So we now have this ultra-specific |
I would definitely accept a PR removing the method. |
maybe relevant to that: #31305 |
Tested that array gives the same answer on random data, and that performance is equivalent (with LLVM 20 on AArch64). Neither of these methods is currently used for anything, but the intent is to change some users (e.g. String) to the array interface and other users (e.g. large Numbers and AbstractString) to the iterator interface (I already have both written–or rather Claude does–but in a different repo right now). The iterator interface does not exist anywhere else, since it is about 8x slower (essentially operating byte by byte instead of the intended 64-bit chunks).
PR implemented mostly by Claude, though I had to fix its math in several places since it got quite confused by the logical nesting of if branches and the implications of extracting trailing bytes.
Test "proof" of correctness: