In the E-Laute project, we focus on the analysis of medieval lute tablatures by applying various models to extract information from these musical pieces. At this early stage of the project, our primary aim is to develop a robust pipeline for integrating data from different sources, facilitating the execution of reproducible experiments.
This document addresses the reproducibility aspect of our work. After a thorough review of existing provenance tracking libraries, we chose to use MLProvLab [1]. MLProvLab is a Jupyter Lab extension designed to track all processes within notebooks. Upon completing a process, the extension allows for the export of data in JSON format. This exported JSON data can be processed using our custom script, mlprovlab_rdf_conversion.py, which converts the JSON into a Turtle file in the PROV-O format. This format is suitable for use in graph databases such as GraphDB. Below are some sample SPARQL queries that can be executed on the provenance data:
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT ?activity ?code
WHERE {
?activity a prov:Activity ;
prov:generated ?codeEntity .
?codeEntity prov:value ?code .
}
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT ?activity ?agent
WHERE {
?activity a prov:Activity ;
prov:wasAssociatedWith ?agent .
}
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT ?activity ?usedEntity
WHERE {
?activity a prov:Activity ;
prov:used ?usedEntity .
}
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?activity ?usedEntity
WHERE {
?activity a prov:Activity ;
prov:used ?usedEntity .
BIND(xsd:integer(REPLACE(STR(?activity), "http://example.org/execution/", "")) AS ?activityID)
}
ORDER BY ?activityID
Each Jupyter notebook cell corresponds to an individual prov:Activity, and each dependency used within the cells is represented as an prov:Entity. These entities are linked through the prov:used attribute. The prov:wasAssociatedWith attribute captures the agent responsible for executing these Jupyter notebooks, typically the computational engine or the user.
For comprehensive provenance documentation, ensure that the experiment is running when you activate MLProvLab. This allows for the complete documentation of the process in one instance. The generated provenance data can then be transformed using the provided conversion script.
-
Complete your experiments within Jupyter Lab.
-
Install MLProvLab:
pip install mlprovlab
-
Launch Jupyter Lab:
jupyter lab
-
Execute the cells in your notebook.
-
Click "Export" in the MLProvLab tab and download the JSON file containing the provenance data.
-
Convert the JSON to Turtle format using the provided script:
python3 mlprovlab_rdf_conversion.py path/to/input.json path/to/output.ttl
This documentation will help ensure that all experimental processes are tracked and reproducible, aiding in the validation and verification of results.
[1]: Kerzel, Dominik, König-Ries, Birgitta, and Samuel, Sheeba. (2023). "MLProvLab: Provenance Management for Data Science Notebooks." In BTW 2023, Gesellschaft für Informatik e.V., Bonn, ISBN 978-3-88579-725-8, pp. 965-980. DOI: 10.18420/BTW2023-66.