The five safes framework is a conceptual framework for data access and sharing that emphasizes five key principles: Safe Projects, Safe People, Safe Settings, Safe Data, and Safe Outputs. It is designed to ensure that data is used responsibly and ethically while maximizing its utility for research and analysis. By ensuring each safe is appropriately managed, pragmatic decisions can be made about mitigating risks associated with data sharing and access. The goal isn't to maximise the controls in each safe, but to ensure that the controls are appropriate for the risks associated with the data and the intended use, this may mean that e.g. one of the safes has very strict controls, which means that one of the others could have less.
RO-Crates are a way to package and share research data and metadata in a standardized format. They provide a structured way to organize data, code, and documentation, making it easier to share and reuse research outputs. RO-Crates are designed to be machine-readable and human-readable, ensuring that the data can be easily understood and used by others.
DataSHIELD has multiple components which contribute controls to the five safes framework, but they are not co-ordinated or in the language of the five safes framework.
There is a five safes RO-Crates profile () which was partly developed as part part of TRE-FX. It brings together both five safes and workflows into one entity.
To assess the fit of 5S RO-Crates with DataSHIELD, and if we need to modify it or develop a new DataSHIELD profile which inherits (or the inverse) we need specify our use cases. These are some examples which RO-Crates could be used for in DataSHIELD:
A TRE acting in isolation, or as part of a federated network, may want to audit or report on its own use of DataSHIELD. This could include information about the projects, people, settings, data, and outputs derived from DataSHIELD within the TRE. This could be packaged in an easy to understand dashboard to provide a summary of the TRE's DataSHIELD activities.
Where a TRE is in a federated network, there is an agreed degree of trust in each TRE to ensure that data sent between them is as expected. In the context of DataSHIELD the assumption is that correct statistical disclosure control has been applied before data is sent to another TRE. This is difficult to verify, e.g. how would TRE 1 know that TRE 2 has applied the correct SDC? We could package informatation about the SDC applied e.g. the disclosure thresholds etc, so that each TRE has a record of what has happened to the data before it receieves it, allowing post hoc audit.
This is the same scenaio as above, except that instead of post hoc auditing, the information about the five safes is used to make real time decisions about whether to accept data from another TRE. This could be used to ensure that the data meets the required standards for disclosure control before it is accepted into the TRE.
DataSHIELD may be set up in an environment where the results of analyses by the client software are required to have manual SDC carried out on them before they can leave the network. We could package the information about the methods used in the analysis and the relevant thresholds for SDC along with the result requested out the network. This would allow the manual SDC to have an audit trail of what was done and would act as a decision support tool to help understand the risks associated with the output.
To enable an analysis to be reproduced at a later date, it is important to have a record of the data, code, and methods used in the analysis. We could package this up in an RO-Crate.
In most of the use cases above there is a requirement to have information mapped to the five safes framework. This is where we should start. Assuming we use Opal, we need to understand how we can populate the five safes information from existing library and API calls. five_safes_mapping.R is a first attempt at this. Using opalr, DSI, and the opal API we can get begin this mapping. We won't worry about formatting it as an RO-Crate for now.
ACTION ALL: update five_safes_mapping.R to include more information relevant to the five safes framework.
There is likely other information which we would REQUIRE to include.
ACTION ALL: Think about other information which we would REQUIRE to include in the five safes mapping.
Cre8or outputs an RO-Crate with lots of upstream information which would likely be useful in our five safes mapping. We should work through an example to see how it maps.
ACTION RVD: Get an example output from cre8tor from Mike.
Something is going to have to collate the information for the RO-Crate. Where this sits and how it is invoked needs to be decided. It might be that it sits next to DSI. It might be invoked once at the end of an analysis or it might be invoked on every iteration of an analysis.
ACTION ALL: Think about where the RO-Crate engine should sit and how it is invoked.
Let's stat with a simple DataSHIELD function - ds.mean and work outwards from there.
This package contains functions to pack DataSHIELD analyses into RO-Crates.
This package contains functions to generate RO-Crates, including the 5s-crates profile.
Opal R Client for the Opal data warehouse. Most of the web services of Opal can be reached by an opalr function: import/export, data dictionaries, projects, tables, resources, permissions, users, DataSHIELD profiles etc.
Armadillo implementation of DSI to be DataSHIELD ready, part of the MOLGENIS suite.