Five safes and RO-Crates in DataSHIELD

Background

Five Safes Framework

The five safes framework is a conceptual framework for data access and sharing that emphasizes five key principles: Safe Projects, Safe People, Safe Settings, Safe Data, and Safe Outputs. It is designed to ensure that data is used responsibly and ethically while maximizing its utility for research and analysis. By ensuring each safe is appropriately managed, pragmatic decisions can be made about mitigating risks associated with data sharing and access. The goal isn't to maximise the controls in each safe, but to ensure that the controls are appropriate for the risks associated with the data and the intended use, this may mean that e.g. one of the safes has very strict controls, which means that one of the others could have less.

RO-Crates

RO-Crates are a way to package and share research data and metadata in a standardized format. They provide a structured way to organize data, code, and documentation, making it easier to share and reuse research outputs. RO-Crates are designed to be machine-readable and human-readable, ensuring that the data can be easily understood and used by others.

DataSHIELD

DataSHIELD has multiple components which contribute controls to the five safes framework, but they are not co-ordinated or in the language of the five safes framework.

5 Safes RO-Crates

There is a five safes RO-Crates profile () which was partly developed as part part of TRE-FX. It brings together both five safes and workflows into one entity.

Use cases

To assess the fit of 5S RO-Crates with DataSHIELD, and if we need to modify it or develop a new DataSHIELD profile which inherits (or the inverse) we need specify our use cases. These are some examples which RO-Crates could be used for in DataSHIELD:

Intra-TRE audit/reporting

A TRE acting in isolation, or as part of a federated network, may want to audit or report on its own use of DataSHIELD. This could include information about the projects, people, settings, data, and outputs derived from DataSHIELD within the TRE. This could be packaged in an easy to understand dashboard to provide a summary of the TRE's DataSHIELD activities.

Inter-TRE audit

Where a TRE is in a federated network, there is an agreed degree of trust in each TRE to ensure that data sent between them is as expected. In the context of DataSHIELD the assumption is that correct statistical disclosure control has been applied before data is sent to another TRE. This is difficult to verify, e.g. how would TRE 1 know that TRE 2 has applied the correct SDC? We could package informatation about the SDC applied e.g. the disclosure thresholds etc, so that each TRE has a record of what has happened to the data before it receieves it, allowing post hoc audit.

Inter-TRE actionable decision making

This is the same scenaio as above, except that instead of post hoc auditing, the information about the five safes is used to make real time decisions about whether to accept data from another TRE. This could be used to ensure that the data meets the required standards for disclosure control before it is accepted into the TRE.

SDC ouput documentation

DataSHIELD may be set up in an environment where the results of analyses by the client software are required to have manual SDC carried out on them before they can leave the network. We could package the information about the methods used in the analysis and the relevant thresholds for SDC along with the result requested out the network. This would allow the manual SDC to have an audit trail of what was done and would act as a decision support tool to help understand the risks associated with the output.

Reproducibility

To enable an analysis to be reproduced at a later date, it is important to have a record of the data, code, and methods used in the analysis. We could package this up in an RO-Crate.

Development plan

Five safes mapping

In most of the use cases above there is a requirement to have information mapped to the five safes framework. This is where we should start. Assuming we use Opal, we need to understand how we can populate the five safes information from existing library and API calls. five_safes_mapping.R is a first attempt at this. Using opalr, DSI, and the opal API we can get begin this mapping. We won't worry about formatting it as an RO-Crate for now.

ACTION ALL: update five_safes_mapping.R to include more information relevant to the five safes framework.

There is likely other information which we would REQUIRE to include.

ACTION ALL: Think about other information which we would REQUIRE to include in the five safes mapping.

Cre8or outputs an RO-Crate with lots of upstream information which would likely be useful in our five safes mapping. We should work through an example to see how it maps.

ACTION RVD: Get an example output from cre8tor from Mike.

RO-Crate engine locaton

Something is going to have to collate the information for the RO-Crate. Where this sits and how it is invoked needs to be decided. It might be that it sits next to DSI. It might be invoked once at the end of an analysis or it might be invoked on every iteration of an analysis.

ACTION ALL: Think about where the RO-Crate engine should sit and how it is invoked.

Scope of work

Let's stat with a simple DataSHIELD function - ds.mean and work outwards from there.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md
five_safes_mapping.R		five_safes_mapping.R
fives-safes-ro-crate-profile.md		fives-safes-ro-crate-profile.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Five safes and RO-Crates in DataSHIELD

Background

Five Safes Framework

RO-Crates

DataSHIELD

5 Safes RO-Crates

Use cases

Intra-TRE audit/reporting

Inter-TRE audit

Inter-TRE actionable decision making

SDC ouput documentation

Reproducibility

Development plan

Five safes mapping

RO-Crate engine locaton

Scope of work

Relevant R Packages

dsROCrate

rocrateR

opalr

DSMolgenisArmadillo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Five safes and RO-Crates in DataSHIELD

Background

Five Safes Framework

RO-Crates

DataSHIELD

5 Safes RO-Crates

Use cases

Intra-TRE audit/reporting

Inter-TRE audit

Inter-TRE actionable decision making

SDC ouput documentation

Reproducibility

Development plan

Five safes mapping

RO-Crate engine locaton

Scope of work

Relevant R Packages

dsROCrate

rocrateR

opalr

DSMolgenisArmadillo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages