-
Notifications
You must be signed in to change notification settings - Fork 3
Description
One of the issues often confronted by depositors of aspiring FAIR data is how much data loss to tolerate. I give just one example, crystallographic data in chemistry (often described as the Gold Standard in chemical Data). There are the following hierarchies, with increasing data loss:
- The raw instrument data
- The processed instrument data, including "hkl" information
- The processed instrument data, including rich structure information but excluding "hkl" data
- The processed minimum dataset, which suffices for perhaps 90% of most user's needs
- A graphical representation of the minimum dataset, as a JPEG or PDF...
- which itself can be lossy.
So most consumers of say category 4 would find it adequately FAIR for their needs, but some specialist users would find it too lossy, and might need to go as high as category 1. The trouble is that this type of data might be as much as 10,000 times larger than the minimal set.
Unfortunately there is no easy way of specifying the degree of data loss in any aspiring FAIR dataset as metadata information. This remember is considered the "gold" standard. One finds similar situations in other types of chemical data.