Skip to content

FAIR data decisions: Lossy or lossless #27

@hrzepa

Description

@hrzepa

One of the issues often confronted by depositors of aspiring FAIR data is how much data loss to tolerate. I give just one example, crystallographic data in chemistry (often described as the Gold Standard in chemical Data). There are the following hierarchies, with increasing data loss:

  1. The raw instrument data
  2. The processed instrument data, including "hkl" information
  3. The processed instrument data, including rich structure information but excluding "hkl" data
  4. The processed minimum dataset, which suffices for perhaps 90% of most user's needs
  5. A graphical representation of the minimum dataset, as a JPEG or PDF...
  6. which itself can be lossy.

So most consumers of say category 4 would find it adequately FAIR for their needs, but some specialist users would find it too lossy, and might need to go as high as category 1. The trouble is that this type of data might be as much as 10,000 times larger than the minimal set.

Unfortunately there is no easy way of specifying the degree of data loss in any aspiring FAIR dataset as metadata information. This remember is considered the "gold" standard. One finds similar situations in other types of chemical data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions