Skip to content

Define a standard hashing procedure #436

@gouttegd

Description

@gouttegd

To allow implementations to unambiguously and consistently assess whether two mapping records are identical, the spec could/should define a standard hashing function and how the hash should be computed.

Considerations:

A. Choice of the hashing function. Not really important, we just need to pick one. Easiest option would probably be SHA-256, which is almost ubiquitously available in all programming languages. We do not really need its cryptographic properties of collision resistance and preimage resistance, but they don’t hurt.

B. What to hash? Simply put: everything. That is, all the slots that make up a mapping record. This should also probably include any non-standard slot.

C. How to hash? This is the real question. We need to define a serialisation format such that any given mapping record can have one, and only one, possible serialised form. The “canonical SSSOM/TSV” format as currently defined in the spec is not suitable, as it still leaves some room for variations across implementations.

One option would be to serialise the record into a canonical S-expression, e.g.

(7:mapping((10:subject_id44:http://purl.obolibrary.org/obo/FBbt_00001234)(12:predicate_id:46http://www.w3.org/2004/02/skos/core#exactMatch)(9:object_id45:http://purl.obolibrary.org/obo/UBERON_0005678)(21:mapping_justification51:https://w3id.org/semapv/vocab/ManualMappingCuration)(10:creator_id(37:https://orcid.org/0000-0000-1234-567837:https://orcid.org/0000-0000-5678-1234))

Regardless of the exact serialisation format, prior to serialisation and hashing: (1) all CURIEs must be expanded to their full-length form; (2) all propagatable slots must be propagated; (3) all multi-valued slots must be sorted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions