-
Notifications
You must be signed in to change notification settings - Fork 12
Add a gold standard example using data from chemotion repository #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
507a3de
787d4dd
84aa947
7b714aa
2d4898d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,9 @@ | ||
| ## PASTA ELN | ||
| Its home is at: https://github.com/PASTA-ELN | ||
|
|
||
| This folder contains two files: | ||
| - PASTA.eln an export of the standard example of an installation with samples, measurements, devices, ... | ||
| - A gold‑standard sibling triplet consists of an ELN file, a JSON‑LD file, and a Turtle file. The example shows | ||
| that the ELN file fully supersedes the JSON‑LD/Turtle files in terms of content. [more](goldStandard.md) | ||
|
|
||
|
|
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # Gold‑standard example | ||
|
|
||
| This gold‑standard sibling triplet consists of an ELN | ||
| file, a JSON‑LD file, and a Turtle file. The example was created from the data | ||
| for CRR‑25894 on chemotion‑repository.org. The triplet shows that FULL semantic information can be included into the ELN file. | ||
|
|
||
| - **The JSON‑LD and Turtle files are equivalent:** the Turtle file was generated | ||
| from the JSON‑LD representation using rdflib. | ||
| - **All data contained in the JSON‑LD/Turtle files are also present in the ELN file:** minor format conversions are summarized below. | ||
| - **Binary and raw data contained in the ELN file are NOT included in the JSON‑LD or Turtle files.** | ||
|
|
||
| **Therefore, the ELN file fully supersedes the JSON‑LD/Turtle files in terms of content** as also evident from the size although the json-ld file is packed less dense. | ||
|
|
||
| ## Format conversion: JSON‑LD → ELN file | ||
|
|
||
| 1. Create an empty folder and add the data files to it. | ||
| 2. Ensure that every node has an `@id`. If a node lacks an identifier, create | ||
| one using the pattern `@type_@name`. | ||
| 3. Flatten hierarchical dictionaries into a list of nodes, using `@id` values | ||
| to represent links. | ||
| 4. Add the manifest nodes and include them in the node list. | ||
| 5. Remove duplicate nodes. | ||
| 6. Adjust dataset `@id` values to reflect the file locations of the | ||
| corresponding data sets: | ||
| - Add an additional file node to the node list for each data file. Each file | ||
| node must include `@id`, `@type`, `@name`, and `sha256`. | ||
| - Use the instrument information from `dataset-description.txt` as | ||
| supplemental metadata for the dataset. | ||
| - Link the additional data files via `hasPart` both in the dataset node and in | ||
| the root `./` node. | ||
| 7. Restructure the root node: | ||
| - Set `@id` to `./` and `@type` to `Dataset`. | ||
| - Add all datasets and files under the root node using `hasPart`. | ||
| 8. Rename protected keys: | ||
| - `author` → `authors` | ||
| - `keywords` → `keywordLists` | ||
| - Remove the protected `about` key if it only links back to the root node. | ||
| - rename @type: 'Study' -> 'CreativeWork' | ||
| - rename @type: 'QuantitativeValue' -> 'PropertyValue' | ||
| - for @type: Person, create name using givenName and familyName | ||
| - `affiliation` -> `worksFor` | ||
| - for @type: PropertyValue: add propertyID | ||
|
|
||
| ## Original data | ||
|
|
||
| Source: CRR‑25894 from [chemotion‑repository.net](https://www.chemotion-repository.net/home/publications/collections/4916). | ||
|
|
||
| The JSON‑LD files were downloaded manually and merged by copying the Analysis | ||
| branches into the main Reaction tree to produce a single master JSON‑LD file. | ||
| The files used include: | ||
|
|
||
| - `JSON-LD_Reaction_7354-20251031....json` | ||
| - `JSON-LD_Analysis_674242-20251031....json` | ||
| - `JSON-LD_Analysis_674244-20251031....json` | ||
| - `JSON-LD_Analysis_674246-20251031....json` | ||
| - `JSON-LD_Analysis_674248-20251031....json` | ||
|
|
||
| The master JSON‑LD file was converted to a Turtle file. A heuristic | ||
| verification schema (`goldStandardShapes.ttl`) was created, and all files were | ||
| validated against it. | ||
|
|
Large diffs are not rendered by default.
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the source of this SHACL shapes?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea was to show a prove-of-concept: json-ld can be validated. Hence, I needed a shapes file that catches errors. And this does that job: I changed the data a few times and the shapes files caught it. How was it generated: some-LLM created
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,232 @@ | ||
| @prefix dcterms: <http://purl.org/dc/terms/> . | ||
| @prefix ex: <http://example.org/> . | ||
| @prefix sh: <http://www.w3.org/ns/shacl#> . | ||
| @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . | ||
|
|
||
| <http://schema.org/ChemicalSubstanceShape> a sh:NodeShape ; | ||
| sh:property ex:alternateNamePropShape, | ||
| ex:descriptionPropShape, | ||
| ex:hasBioChemEntityPartPropShape, | ||
| ex:identifierPropShape, | ||
| ex:imagePropShape, | ||
| ex:namePropShape, | ||
| ex:subjectOfPropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/ChemicalSubstance> . | ||
|
|
||
| <http://schema.org/CreativeWorkShape> a sh:NodeShape ; | ||
| sh:property ex:authorPropShape, | ||
| ex:namePropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/CreativeWork> . | ||
|
|
||
| <http://schema.org/DataCatalogShape> a sh:NodeShape ; | ||
| sh:property ex:contributorPropShape, | ||
| ex:descriptionPropShape, | ||
| ex:isAccessibleForFreePropShape, | ||
| ex:keywordsPropShape, | ||
| ex:licensePropShape, | ||
| ex:namePropShape, | ||
| ex:providerPropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/DataCatalog> . | ||
|
|
||
| <http://schema.org/DatasetShape> a sh:NodeShape ; | ||
| sh:property ex:authorPropShape, | ||
| ex:conformsToPropShape, | ||
| ex:creatorPropShape, | ||
| ex:descriptionPropShape, | ||
| ex:identifierPropShape, | ||
| ex:includedInDataCatalogPropShape, | ||
| ex:isPartOfPropShape, | ||
| ex:licensePropShape, | ||
| ex:measurementTechniquePropShape, | ||
| ex:namePropShape, | ||
| ex:publisherPropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/Dataset> . | ||
|
|
||
| <http://schema.org/DefinedTermSetShape> a sh:NodeShape ; | ||
| sh:property ex:namePropShape ; | ||
| sh:targetClass <http://schema.org/DefinedTermSet> . | ||
|
|
||
| <http://schema.org/DefinedTermShape> a sh:NodeShape ; | ||
| sh:property ex:alternateNamePropShape, | ||
| ex:inDefinedTermSetPropShape, | ||
| ex:namePropShape, | ||
| ex:termCodePropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/DefinedTerm> . | ||
|
|
||
| <http://schema.org/MolecularEntityShape> a sh:NodeShape ; | ||
| sh:property ex:conformsToPropShape, | ||
| ex:inChIKeyPropShape, | ||
| ex:inChIPropShape, | ||
| ex:iupacNamePropShape, | ||
| ex:molecularFormulaPropShape, | ||
| ex:molecularWeightPropShape, | ||
| ex:namePropShape, | ||
| ex:smilesPropShape ; | ||
| sh:targetClass <http://schema.org/MolecularEntity> . | ||
|
|
||
| <http://schema.org/OrganizationShape> a sh:NodeShape ; | ||
| sh:property ex:logoPropShape, | ||
| ex:namePropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/Organization> . | ||
|
|
||
| <http://schema.org/PersonShape> a sh:NodeShape ; | ||
| sh:property ex:affiliationPropShape, | ||
| ex:familyNamePropShape, | ||
| ex:givenNamePropShape, | ||
| ex:identifierPropShape, | ||
| ex:namePropShape ; | ||
| sh:targetClass <http://schema.org/Person> . | ||
|
|
||
| <http://schema.org/QuantitativeValueShape> a sh:NodeShape ; | ||
| sh:property ex:unitCodePropShape, | ||
| ex:valuePropShape ; | ||
| sh:targetClass <http://schema.org/QuantitativeValue> . | ||
|
|
||
| <http://schema.org/StudyShape> a sh:NodeShape ; | ||
| sh:property ex:aboutPropShape, | ||
| ex:additionalTypePropShape, | ||
| ex:authorPropShape, | ||
| ex:citationPropShape, | ||
| ex:conformsToPropShape, | ||
| ex:contributorPropShape, | ||
| ex:creatorPropShape, | ||
| ex:dateCreatedPropShape, | ||
| ex:datePublishedPropShape, | ||
| ex:descriptionPropShape, | ||
| ex:identifierPropShape, | ||
| ex:keywordsPropShape, | ||
| ex:licensePropShape, | ||
| ex:namePropShape, | ||
| ex:providerPropShape, | ||
| ex:publisherPropShape, | ||
| ex:subjectOfPropShape, | ||
| ex:urlPropShape ; | ||
| sh:targetClass <http://schema.org/Study> . | ||
|
|
||
| ex:aboutPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/about> . | ||
|
|
||
| ex:additionalTypePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/additionalType> . | ||
|
|
||
| ex:affiliationPropShape sh:nodeKind sh:BlankNode ; | ||
| sh:path <http://schema.org/affiliation> . | ||
|
|
||
| ex:citationPropShape sh:nodeKind sh:BlankNode ; | ||
| sh:path <http://schema.org/citation> . | ||
|
|
||
| ex:dateCreatedPropShape sh:datatype <http://schema.org/Date> ; | ||
| sh:path <http://schema.org/dateCreated> . | ||
|
|
||
| ex:datePublishedPropShape sh:datatype <http://schema.org/Date> ; | ||
| sh:path <http://schema.org/datePublished> . | ||
|
|
||
| ex:familyNamePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/familyName> . | ||
|
|
||
| ex:givenNamePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/givenName> . | ||
|
|
||
| ex:hasBioChemEntityPartPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/hasBioChemEntityPart> . | ||
|
|
||
| ex:imagePropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/image> . | ||
|
|
||
| ex:inChIKeyPropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/inChIKey> . | ||
|
|
||
| ex:inChIPropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/inChI> . | ||
|
|
||
| ex:inDefinedTermSetPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/inDefinedTermSet> . | ||
|
|
||
| ex:includedInDataCatalogPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/includedInDataCatalog> . | ||
|
|
||
| ex:isAccessibleForFreePropShape sh:datatype xsd:boolean ; | ||
| sh:path <http://schema.org/isAccessibleForFree> . | ||
|
|
||
| ex:isPartOfPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/isPartOf> . | ||
|
|
||
| ex:iupacNamePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/iupacName> . | ||
|
|
||
| ex:logoPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/logo> . | ||
|
|
||
| ex:measurementTechniquePropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/measurementTechnique> . | ||
|
|
||
| ex:molecularFormulaPropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/molecularFormula> . | ||
|
|
||
| ex:molecularWeightPropShape sh:nodeKind sh:BlankNode ; | ||
| sh:path <http://schema.org/molecularWeight> . | ||
|
|
||
| ex:smilesPropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/smiles> . | ||
|
|
||
| ex:termCodePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/termCode> . | ||
|
|
||
| ex:unitCodePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/unitCode> . | ||
|
|
||
| ex:valuePropShape sh:datatype xsd:double ; | ||
| sh:path <http://schema.org/value> . | ||
|
|
||
| ex:alternateNamePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/alternateName> . | ||
|
|
||
| ex:contributorPropShape sh:nodeKind sh:BlankNodeOrIRI ; | ||
| sh:path <http://schema.org/contributor> . | ||
|
|
||
| ex:creatorPropShape sh:nodeKind sh:BlankNode ; | ||
| sh:path <http://schema.org/creator> . | ||
|
|
||
| ex:keywordsPropShape | ||
| sh:or ( [ sh:datatype xsd:string ] | ||
| [ sh:nodeKind sh:IRI ] ) ; | ||
| sh:path <http://schema.org/keywords> . | ||
|
|
||
| ex:providerPropShape sh:nodeKind sh:BlankNode ; | ||
| sh:path <http://schema.org/provider> . | ||
|
|
||
| ex:publisherPropShape sh:nodeKind sh:BlankNode ; | ||
| sh:path <http://schema.org/publisher> . | ||
|
|
||
| ex:subjectOfPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/subjectOf> . | ||
|
|
||
| ex:authorPropShape | ||
| sh:or ( [ sh:datatype xsd:string ] | ||
| [ sh:nodeKind sh:BlankNodeOrIRI ; sh:class <http://schema.org/Person> ] ) ; | ||
| sh:path <http://schema.org/author> . | ||
|
|
||
| ex:conformsToPropShape sh:nodeKind sh:IRI ; | ||
| sh:path dcterms:conformsTo . | ||
|
|
||
| ex:licensePropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/license> . | ||
|
|
||
| ex:descriptionPropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/description> . | ||
|
|
||
| ex:identifierPropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/identifier> . | ||
|
|
||
| ex:urlPropShape sh:nodeKind sh:IRI ; | ||
| sh:path <http://schema.org/url> . | ||
|
|
||
| ex:namePropShape sh:datatype xsd:string ; | ||
| sh:path <http://schema.org/name> . | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a major difference to the RO-Create JSON-LD other than beeing framed / nested (by JSON-LDs framing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, some keys are renamed, some @types have to be changed. There is a readme file to list all those changes.