Define a standard hashing procedure

To allow implementations to _unambiguously_ and _consistently_ assess whether two mapping records are identical, the spec could/should define a standard hashing function and how the hash should be computed.

Considerations:

**A. Choice of the hashing function.** Not really important, we just need to pick one. Easiest option would probably be SHA-256, which is almost ubiquitously available in all programming languages. We do not really need its cryptographic properties of collision resistance and preimage resistance, but they don’t hurt.

**B. What to hash?** Simply put: _everything_. That is, _all_ the slots that make up a mapping record. This should also probably include any non-standard slot.

**C. How to hash?** _This_ is the real question. We need to define a serialisation format such that any given mapping record can have one, **and only one**, possible serialised form. The “canonical SSSOM/TSV” format as currently defined in the spec is not suitable, as it still leaves some room for variations across implementations.

One option would be to serialise the record into a [canonical S-expression](https://en.wikipedia.org/wiki/Canonical_S-expressions), e.g.

```
(7:mapping((10:subject_id44:http://purl.obolibrary.org/obo/FBbt_00001234)(12:predicate_id:46http://www.w3.org/2004/02/skos/core#exactMatch)(9:object_id45:http://purl.obolibrary.org/obo/UBERON_0005678)(21:mapping_justification51:https://w3id.org/semapv/vocab/ManualMappingCuration)(10:creator_id(37:https://orcid.org/0000-0000-1234-567837:https://orcid.org/0000-0000-5678-1234))
```

Regardless of the exact serialisation format, prior to serialisation and hashing: (1) all CURIEs must be expanded to their full-length form; (2) all propagatable slots must be propagated; (3) all multi-valued slots must be sorted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Define a standard hashing procedure #436

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Define a standard hashing procedure #436

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions