Skip to content

Commit 3a92f42

Browse files
authored
Merge pull request #442 from softwarepub/refactor/363-record-provenance-documentation
Document standardized provenance recording
2 parents c7c498f + 0253a3d commit 3a92f42

File tree

5 files changed

+2210
-0
lines changed

5 files changed

+2210
-0
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
<!--
2+
SPDX-FileCopyrightText: 2025 German Aerospace Center (DLR), Forschungszentrum Jülich, Helmholtz-Zentrum Dresden-Rossendorf
3+
4+
SPDX-License-Identifier: CC-BY-SA-4.0
5+
-->
6+
# Standardized provenance recording
7+
8+
* Status: proposed
9+
* Deciders: sdruskat, skernchen, notactuallyfinn
10+
* Date: 2025-10-17
11+
12+
Technical story:
13+
* https://github.com/softwarepub/hermes/pull/442
14+
* https://github.com/softwarepub/hermes/issues/363
15+
16+
## Context and Problem Statement
17+
18+
To consolidate traceability of the metadata, and resolution based on metadata sources in case of duplicates, etc., we need to record the provenance of metadata values in a __standardized__ way.
19+
To achieve this, we use the [PROV-O ontology](https://www.w3.org/TR/prov-o/) serialized as [JSON-LD](https://www.w3.org/TR/json-ld/). Additionally, HERMES should make it possible to record as much of the provenance as possible *centrally*, i.e., as part of the core codebase. This is to keep plugin developers from having to supply their own provenance solutions.
20+
21+
To do this, we need to specify what provenance information is recorded and how it can be implemented in HERMES to make it easy to use.
22+
23+
## Considered Options
24+
25+
* Provide HERMES API-methods that also document themselves
26+
27+
## Decision Outcome
28+
29+
Chosen option:
30+
31+
## Pros and Cons of the Options
32+
33+
### Provide HERMES API-methods that also document themselves
34+
35+
Provide API-methods for loading, writing, making web requests, etc. that document themselves.<br>
36+
Those methods take also the function that should be used for the task at hand and just define a framework in which we implement the provenance-data recording.<br>
37+
Like so:
38+
```python
39+
class HermesPlugin():
40+
def load(func, path: str, *args, **kwargs):
41+
# TODO: handle and record byte formats properly
42+
with open(path) as fi:
43+
data = func(fi, *args, **kwargs)
44+
prov.record("load", path, func.__name__, data) # also module of func
45+
return data
46+
47+
def write(func, path: str, data, *args, **kwargs):
48+
# TODO: handle and record byte formats properly
49+
with open(path) as fi:
50+
func(fi, data, *args, **kwargs)
51+
prov.record("write", path, func.__name__, data) # also module of func
52+
```
53+
54+
* Good, because allows for recording of provenance information of the plugins
55+
* Good, because it isn't making plugin development harder
56+
* Bad, because API methods may not cover all I/O functionality python provides
57+
* Bad, because it doesn't cover merging, mapping, etc.
58+
59+
All provenance information should be recorded in the following format where addtional properties of agents, activites and entities are values of suitable vocabularies (from Schema.org, CodeMeta and potentially other schemas):
60+
61+
![](./hermes-prov-diagram/hermes-prov.svg)<br>
62+
source: [hermes-prov.drawio](./hermes-prov-diagram/hermes-prov.drawio)

0 commit comments

Comments
 (0)