hashing interface #31
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a more robust interface for the
ghash
function, allowing the user to manually select a hashing algorithm. This is necessary because the JuliaBase.hash
is not resistant enough to collisions -- I recently ran into a collision while working with a set of about 80000 graphs (see also the discussion in #27 and the link therein).Main changes:
SHA
library. Both in 64 or 128 bits. This was already used for large graphs, but can now be chosen through the interface. It's pretty slow, but the most secure.Base.hash
is still available, but not recommended.ghash(g; alg=XXHash64Alg())
. Algorithm choice is explained in the docstring toghash
.Since the hash values depend on the chosen algorithm, graph hashes are now cached together with the algorithm that was used to compute them. For this there is a simpleHashCache
type that holds 64bit and 128bit hashes together with the algorithms. For now this is only internal, but something like this could also be exported as a convenience/safety layer that checks if hash algorithms are matching before comparing hashes.UInt64
in 99.9% of cases anyway, I think this is not a problem. It may become a problem in the future though, if we want to compare hashes between aDenseNautyGraph
and aSparseNautyGraph
.Notes:
CBinding.jl
which takes quite long to precompile on my machine.) Since I only need a tiny subset of the functionality of xxHash, I am depending onxxHash_jll
directly and call the C interface myself.ghash
I am not using multiple dispatch to select the hash algorithm, but instead I use type checks on the algorithm structs, which should compile away. @Krastanov: I guess this should be equivalent, but do you think it is better/safer to use multiple dispatch?