Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,066 changes: 1,066 additions & 0 deletions examples/PASTA/README.md

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions examples/PASTA/README_template.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
## PASTA ELN
Its home is at: https://github.com/PASTA-ELN

This folder contains two files:
- PASTA.eln an export of the standard example of an installation with samples, measurements, devices, ...
- A gold‑standard sibling triplet consists of an ELN file, a JSON‑LD file, and a Turtle file. The example shows
that the ELN file fully supersedes the JSON‑LD/Turtle files in terms of content. [more](goldStandard.md)


Binary file added examples/PASTA/goldStandard.eln
Binary file not shown.
1,681 changes: 1,681 additions & 0 deletions examples/PASTA/goldStandard.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a major difference to the RO-Create JSON-LD other than beeing framed / nested (by JSON-LDs framing)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, some keys are renamed, some @types have to be changed. There is a readme file to list all those changes.

Large diffs are not rendered by default.

61 changes: 61 additions & 0 deletions examples/PASTA/goldStandard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Gold‑standard example

This gold‑standard sibling triplet consists of an ELN
file, a JSON‑LD file, and a Turtle file. The example was created from the data
for CRR‑25894 on chemotion‑repository.org. The triplet shows that FULL semantic information can be included into the ELN file.

- **The JSON‑LD and Turtle files are equivalent:** the Turtle file was generated
from the JSON‑LD representation using rdflib.
- **All data contained in the JSON‑LD/Turtle files are also present in the ELN file:** minor format conversions are summarized below.
- **Binary and raw data contained in the ELN file are NOT included in the JSON‑LD or Turtle files.**

**Therefore, the ELN file fully supersedes the JSON‑LD/Turtle files in terms of content** as also evident from the size although the json-ld file is packed less dense.

## Format conversion: JSON‑LD → ELN file

1. Create an empty folder and add the data files to it.
2. Ensure that every node has an `@id`. If a node lacks an identifier, create
one using the pattern `@type_@name`.
3. Flatten hierarchical dictionaries into a list of nodes, using `@id` values
to represent links.
4. Add the manifest nodes and include them in the node list.
5. Remove duplicate nodes.
6. Adjust dataset `@id` values to reflect the file locations of the
corresponding data sets:
- Add an additional file node to the node list for each data file. Each file
node must include `@id`, `@type`, `@name`, and `sha256`.
- Use the instrument information from `dataset-description.txt` as
supplemental metadata for the dataset.
- Link the additional data files via `hasPart` both in the dataset node and in
the root `./` node.
7. Restructure the root node:
- Set `@id` to `./` and `@type` to `Dataset`.
- Add all datasets and files under the root node using `hasPart`.
8. Rename protected keys:
- `author` → `authors`
- `keywords` → `keywordLists`
- Remove the protected `about` key if it only links back to the root node.
- rename @type: 'Study' -> 'CreativeWork'
- rename @type: 'QuantitativeValue' -> 'PropertyValue'
- for @type: Person, create name using givenName and familyName
- `affiliation` -> `worksFor`
- for @type: PropertyValue: add propertyID

## Original data

Source: CRR‑25894 from [chemotion‑repository.net](https://www.chemotion-repository.net/home/publications/collections/4916).

The JSON‑LD files were downloaded manually and merged by copying the Analysis
branches into the main Reaction tree to produce a single master JSON‑LD file.
The files used include:

- `JSON-LD_Reaction_7354-20251031....json`
- `JSON-LD_Analysis_674242-20251031....json`
- `JSON-LD_Analysis_674244-20251031....json`
- `JSON-LD_Analysis_674246-20251031....json`
- `JSON-LD_Analysis_674248-20251031....json`

The master JSON‑LD file was converted to a Turtle file. A heuristic
verification schema (`goldStandardShapes.ttl`) was created, and all files were
validated against it.

836 changes: 836 additions & 0 deletions examples/PASTA/goldStandard.ttl

Large diffs are not rendered by default.

232 changes: 232 additions & 0 deletions examples/PASTA/goldStandardShapes.ttl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the source of this SHACL shapes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to show a prove-of-concept: json-ld can be validated. Hence, I needed a shapes file that catches errors. And this does that job: I changed the data a few times and the shapes files caught it.

How was it generated: some-LLM created
Generate SHACL shapes from a JSON-LD file by analyzing its RDF structure.

  • This implementation first collects the kinds of values observed for each property (Literal/IRI/BlankNode) across all instances of a class and only emits a sh:datatype or sh:nodeKind when the predicate has a single consistent kind. If kinds are mixed, no kind/datatype constraint is added.

Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://schema.org/ChemicalSubstanceShape> a sh:NodeShape ;
sh:property ex:alternateNamePropShape,
ex:descriptionPropShape,
ex:hasBioChemEntityPartPropShape,
ex:identifierPropShape,
ex:imagePropShape,
ex:namePropShape,
ex:subjectOfPropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/ChemicalSubstance> .

<http://schema.org/CreativeWorkShape> a sh:NodeShape ;
sh:property ex:authorPropShape,
ex:namePropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/CreativeWork> .

<http://schema.org/DataCatalogShape> a sh:NodeShape ;
sh:property ex:contributorPropShape,
ex:descriptionPropShape,
ex:isAccessibleForFreePropShape,
ex:keywordsPropShape,
ex:licensePropShape,
ex:namePropShape,
ex:providerPropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/DataCatalog> .

<http://schema.org/DatasetShape> a sh:NodeShape ;
sh:property ex:authorPropShape,
ex:conformsToPropShape,
ex:creatorPropShape,
ex:descriptionPropShape,
ex:identifierPropShape,
ex:includedInDataCatalogPropShape,
ex:isPartOfPropShape,
ex:licensePropShape,
ex:measurementTechniquePropShape,
ex:namePropShape,
ex:publisherPropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/Dataset> .

<http://schema.org/DefinedTermSetShape> a sh:NodeShape ;
sh:property ex:namePropShape ;
sh:targetClass <http://schema.org/DefinedTermSet> .

<http://schema.org/DefinedTermShape> a sh:NodeShape ;
sh:property ex:alternateNamePropShape,
ex:inDefinedTermSetPropShape,
ex:namePropShape,
ex:termCodePropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/DefinedTerm> .

<http://schema.org/MolecularEntityShape> a sh:NodeShape ;
sh:property ex:conformsToPropShape,
ex:inChIKeyPropShape,
ex:inChIPropShape,
ex:iupacNamePropShape,
ex:molecularFormulaPropShape,
ex:molecularWeightPropShape,
ex:namePropShape,
ex:smilesPropShape ;
sh:targetClass <http://schema.org/MolecularEntity> .

<http://schema.org/OrganizationShape> a sh:NodeShape ;
sh:property ex:logoPropShape,
ex:namePropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/Organization> .

<http://schema.org/PersonShape> a sh:NodeShape ;
sh:property ex:affiliationPropShape,
ex:familyNamePropShape,
ex:givenNamePropShape,
ex:identifierPropShape,
ex:namePropShape ;
sh:targetClass <http://schema.org/Person> .

<http://schema.org/QuantitativeValueShape> a sh:NodeShape ;
sh:property ex:unitCodePropShape,
ex:valuePropShape ;
sh:targetClass <http://schema.org/QuantitativeValue> .

<http://schema.org/StudyShape> a sh:NodeShape ;
sh:property ex:aboutPropShape,
ex:additionalTypePropShape,
ex:authorPropShape,
ex:citationPropShape,
ex:conformsToPropShape,
ex:contributorPropShape,
ex:creatorPropShape,
ex:dateCreatedPropShape,
ex:datePublishedPropShape,
ex:descriptionPropShape,
ex:identifierPropShape,
ex:keywordsPropShape,
ex:licensePropShape,
ex:namePropShape,
ex:providerPropShape,
ex:publisherPropShape,
ex:subjectOfPropShape,
ex:urlPropShape ;
sh:targetClass <http://schema.org/Study> .

ex:aboutPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/about> .

ex:additionalTypePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/additionalType> .

ex:affiliationPropShape sh:nodeKind sh:BlankNode ;
sh:path <http://schema.org/affiliation> .

ex:citationPropShape sh:nodeKind sh:BlankNode ;
sh:path <http://schema.org/citation> .

ex:dateCreatedPropShape sh:datatype <http://schema.org/Date> ;
sh:path <http://schema.org/dateCreated> .

ex:datePublishedPropShape sh:datatype <http://schema.org/Date> ;
sh:path <http://schema.org/datePublished> .

ex:familyNamePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/familyName> .

ex:givenNamePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/givenName> .

ex:hasBioChemEntityPartPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/hasBioChemEntityPart> .

ex:imagePropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/image> .

ex:inChIKeyPropShape sh:datatype xsd:string ;
sh:path <http://schema.org/inChIKey> .

ex:inChIPropShape sh:datatype xsd:string ;
sh:path <http://schema.org/inChI> .

ex:inDefinedTermSetPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/inDefinedTermSet> .

ex:includedInDataCatalogPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/includedInDataCatalog> .

ex:isAccessibleForFreePropShape sh:datatype xsd:boolean ;
sh:path <http://schema.org/isAccessibleForFree> .

ex:isPartOfPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/isPartOf> .

ex:iupacNamePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/iupacName> .

ex:logoPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/logo> .

ex:measurementTechniquePropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/measurementTechnique> .

ex:molecularFormulaPropShape sh:datatype xsd:string ;
sh:path <http://schema.org/molecularFormula> .

ex:molecularWeightPropShape sh:nodeKind sh:BlankNode ;
sh:path <http://schema.org/molecularWeight> .

ex:smilesPropShape sh:datatype xsd:string ;
sh:path <http://schema.org/smiles> .

ex:termCodePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/termCode> .

ex:unitCodePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/unitCode> .

ex:valuePropShape sh:datatype xsd:double ;
sh:path <http://schema.org/value> .

ex:alternateNamePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/alternateName> .

ex:contributorPropShape sh:nodeKind sh:BlankNodeOrIRI ;
sh:path <http://schema.org/contributor> .

ex:creatorPropShape sh:nodeKind sh:BlankNode ;
sh:path <http://schema.org/creator> .

ex:keywordsPropShape
sh:or ( [ sh:datatype xsd:string ]
[ sh:nodeKind sh:IRI ] ) ;
sh:path <http://schema.org/keywords> .

ex:providerPropShape sh:nodeKind sh:BlankNode ;
sh:path <http://schema.org/provider> .

ex:publisherPropShape sh:nodeKind sh:BlankNode ;
sh:path <http://schema.org/publisher> .

ex:subjectOfPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/subjectOf> .

ex:authorPropShape
sh:or ( [ sh:datatype xsd:string ]
[ sh:nodeKind sh:BlankNodeOrIRI ; sh:class <http://schema.org/Person> ] ) ;
sh:path <http://schema.org/author> .

ex:conformsToPropShape sh:nodeKind sh:IRI ;
sh:path dcterms:conformsTo .

ex:licensePropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/license> .

ex:descriptionPropShape sh:datatype xsd:string ;
sh:path <http://schema.org/description> .

ex:identifierPropShape sh:datatype xsd:string ;
sh:path <http://schema.org/identifier> .

ex:urlPropShape sh:nodeKind sh:IRI ;
sh:path <http://schema.org/url> .

ex:namePropShape sh:datatype xsd:string ;
sh:path <http://schema.org/name> .

2 changes: 1 addition & 1 deletion tests/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def checkSchema(fileName):
metadataJsonFile = [i for i in elnFile.namelist() if i.endswith(METADATA_FILE)][0]
metadataContent = json.loads(elnFile.read(metadataJsonFile))
for error in sorted(validator.iter_errors(metadataContent), key=str):
log += f'- {error.message}'
log += f'- {error.message}\n'
success = False
return success, log

Expand Down