Skip to content

Add Jenkin's lookup3 as a 32-bit checksum for HDF5 #445

@mkitti

Description

@mkitti

Minimal, reproducible code sample, a copy-pastable example if possible

# Jenkin's lookup3 can be useful for verifying HDF5 data structures such as the superblock below
import jenkins_cffi

with open("original_hdf5_zarr_shard_demo.h5", "rb") as f:
    b = f.read(48)

hash_bytes = jenkins_cffi.hashlittle(bytes(b[:-4])).to_bytes(4, "little")
print(b[-4:] == hash_bytes) # True

Problem description

Jenkin's lookup3 is an integral component of the HDF5 specification for its internal datastructures.

This becomes relevant if we would like to reuse HDF5 data structures. For example, the HDF5 Fixed Array Data Block can made byte compatible with the proposed Zarr shard specification, except for the four byte checksum. Currently, the only permitted checksum is crc32.

zarr-developers/zarr-specs#152 (comment)

An implementation of Bob Jenkin's lookup3 is widely available across many languages.

Version and installation information

Please provide the following:

jenkins-cffi              1.0.2.1                  pypi_0    pypi
python                    3.11.0          he550d4f_1_cpython    conda-forge

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions