Defines the canonical metadata model for iSamples - a domain-agnostic standard for describing material samples across scientific disciplines (geology, archaeology, biology, environmental science).
For data users (exploring samples):
- Browse tutorials at isamplesorg.github.io
- Use Jupyter examples from isamples-python (examples repo)
For implementers (integrating with iSamples):
- Schema:
src/schemas/isamples_core.yaml(LinkML source) - JSON Schema:
src/schemas/iSamplesSchemaCore1.0.json - Documentation: https://isamplesorg.github.io/metadata/
For understanding the data model:
- Start with
src/docs/UNDERSTANDING_THE_GRAPH.md- explains 8 entity types + 14 predicates - See
src/docs/EDGE_TYPES_VISUAL.md- visual relationship diagrams
8 Entity Types:
MaterialSampleRecord- The physical sample (core entity)SamplingEvent- When/how collection occurredSamplingSite- Named location (e.g., "Çatalhöyük")GeospatialCoordLocation- WGS84 coordinatesAgent- Person/organizationIdentifiedConcept- Vocabulary termsMaterialSampleCuration- Repository infoSampleRelation- Links between samples
14 Predicates connect these entities (see src/docs/PREDICATES_REFERENCE.md).
Note: The schema also defines
is_part_of(SamplingSite → SamplingSite) for nested site hierarchies. This is excluded from the "14 predicates" count as it's used for site containment rather than sample description.
| Repo | Purpose | Start Here |
|---|---|---|
| isamples-python (examples) | Jupyter examples (DuckDB + Lonboard) | examples/basic/isamples_explorer.ipynb |
| isamplesorg.github.io | Browser tutorials (DuckDB-WASM + Cesium) | tutorials/isamples_explorer.qmd |
| vocabularies | SKOS vocabulary terms | Material types, context categories |
All 6.7M samples available as geoparquet on Cloudflare R2:
# Wide format (recommended) - 280 MB, 20M rows
WIDE_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet"Linkml and associated tools require a python environment, version 3.9 or newer, and uses poetry for dependency management. Poetry can be installed with pip install poetry.
To work on project contents and run artifact generators, first grab the source and switch to the develop branch:
git clone https://github.com/isamplesorg/metadata.git
cd metadata
checkout develop
pull
Setup a virtual environment (e.g. using poetry or mkvirtualenv):
poetry shell
poetry install
(To exit poetry shell, use exit).
Artifacts in the generated/ folder are produced by running make or make all.
Documentation is rendered with Quarto rather than the defaults mkdocs or Sphinx (Quarto offers many additional features for including computed examples which are planned). To generate the documentation, install a version of Quarto >= 1.2, then run make, make all or make gen-docs.
This will generate markdown intermediate files in the build/docs folder then invoke quarto render to generate the HTML docs in the docs/ folder.
Note that this project uses a version of the linkml docgen tool and templates modified to render markdown for quarto. The modified docgen and templates is located in the tools/ folder.
Collation of metadata examples and notes for the project
- background: contains diagrams and information about some existing models that include metadata for samples; files are organized broadly by domain.
- examples: example metadata documents from different systems. Subfolders are
- raw: metadata from the originating system
- test: corresponding records generated manually using the iSamples basic template
- transform: corresponding records generated by automated ETL process from raw records
- vocabulary: vocabularies related to sample metadata from various systems
This branch implments how to use linkML to generate various output and operations for iSamples.
We could use the following command to convert iSamples YAML schema to JSON schema.
gen-json-schema -t PhysicalSampleRecord --not-closed iSamplesSchemaBasic0.3.yaml > iSamplesSchemaBasic0.3.schema.json
In this command, -t PhysicalSampleRecord means to make "physicalSampleRecord" class become the top level class. And the prepoerties of the class become the top level properties in the JSON-schema. The converted JSON scheme file is "iSamplesSchemaBasic0.3.schema.json".
gen-jsonld-context iSamplesSchemaBasic0.3.yaml > iSampleSchemaBasic0.3.jsonld
The command will save the result in the jsonld file. After we have the converted JSON-LD context. The enumeration part of JSON-context should be modified by us manually.
Modified JSON-LD context example
"@context": {
"dct": "http://purl.org/dc/terms/",
"isam": "http://resource.isamples.org/schema/",
"mat": "http://resource.isamples.org/vocabulary/material/",
"pur": "http://resource.isamples.org/vocabulary/samplepurpose/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/",
"w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"@vocab": "http://resource.isamples.org/schema/",
"curation": {
"@type": "@id"
},
"hasContextCategory": {
"@type":"contextcategory"
},
"hasMaterialCategory": {
"@type":"materialtype"
},
"has_sample_object_type": {
"@type":"specimencategory"
},
"id": "@id",
"latitude": {
"@type": "xsd:decimal"
},
"location": {
"@type": "@id"
},
"longitude": {
"@type": "xsd:decimal"
},
"producedBy": {
"@type": "@id"
},
"relatedResource": {
"@type": "@id"
},
"resultTime": {
"@type": "xsd:date"
},
"samplingSite": {
"@type": "@id"
}
}
Before we valideting all instance files, we need to add modified JSON-LD context to the front of instances properties.
Full instance example
{
"@context": {
"dct": "http://purl.org/dc/terms/",
"isam": "http://resource.isamples.org/schema/",
"mat": "http://resource.isamples.org/vocabulary/material/",
"pur": "http://resource.isamples.org/vocabulary/samplepurpose/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/",
"w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"@vocab": "http://resource.isamples.org/schema/",
"curation": {
"@type": "@id"
},
"hasContextCategory": {
"@type":"contextcategory"
},
"hasMaterialCategory": {
"@type":"materialtype"
},
"has_sample_object_type": {
"@type":"specimencategory"
},
"id": "@id",
"latitude": {
"@type": "xsd:decimal"
},
"location": {
"@type": "@id"
},
"longitude": {
"@type": "xsd:decimal"
},
"producedBy": {
"@type": "@id"
},
"relatedResource": {
"@type": "@id"
},
"resultTime": {
"@type": "xsd:date"
},
"samplingSite": {
"@type": "@id"
}
},
"@schema": "../../iSamplesSchemaBasic0.2.json",
"@id": "metadata/21547/Car2PIRE_0334",
"label": "PIRE_0334",
"sampleidentifier": "ark:/21547/Car2PIRE_0334",
"description": "",
"hasContextCategory": ["Marine Biome"],
"hasMaterialCategory": ["Organic Material"],
"has_sample_object_type": ["Whole Organism"],
"informalClassification": ["Gastropoda"],
"keywords": ["Aceh", "Sumatra","Indonesia","Asia", "Mollusca"],
"producedBy": {
"@id":"ark:/21547/Cas2INDO_2016_SEU_1B",
"label": "INDO_2016_SEU_1B",
"description": "expeditionCode: INDO_PIRE | samplingProtocol: ARMS | taxonomy team: MINV | projectId: 80",
"hasFeatureOfInterest": "coral reef",
"responsibility": ["Aji Wahyu Anggoro","Andrianus Sembiring"],
"resultTime": "2016-08-09",
"samplingSite": {
"description": "Shallow, coastal reef. Apparent exposure to current, Porites dominated. Less impacted bleaching site, high recruitment, 12 m.",
"label": "",
"location": {
"elevation": "maximumDepthInMeters: 12",
"latitude": 5.89430,
"longitude": 95.25293
},
"placeName": ["Pulau Seulako"]
}
},
"registrant": "Chris Meyer",
"samplingPurpose": "genomic analysis",
"curation": {
"accessConstraints": "",
"curationLocation": "",
"responsibility": ""
},
"relatedResource": {
"label":"subsample tissue",
"description":"",
"target":"ark:/21547/Cat2INDO106431.1",
"relationship":"subsample"
}
}
We need to use the following command to validate our instance files with schema.
linkml-validate -s iSamplesSchemaBasic0.3.yaml instance.json
jsonschema -i instance.json iSamplesSchemaBasic0.3.schema.json
The first command is to validate instance file with yaml schema. The second command is to validate instance file with json schema.
The iSamples Metadata Docker container is based on the Docker container from the LinkML project [https://hub.docker.com/r/monarchinitiative/linkml/tags]
First you'll build the image:
docker build -t isamples_linkml .
Then, running it will open a bash shell opened to /work, which is the Docker container volume representing the iSamples metadata repository:
docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkml
Then use the following commands to generate LinkML:
- Command 1
- Command 2
- Command 3
- We still focus on implementing the iSamples schema under linkML requirements.
- There are some bugs or unimplemented parts in the linkML.
- The different pc platform will have different results or errors. We prefer to use docker to run linkML. Please follow the linkML tutorial
