You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/inputs/input_data.qmd
+19-19Lines changed: 19 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ title: Input data management
19
19
20
20
:::
21
21
22
-
## 🧾 Input data
22
+
## Input data
23
23
24
24
When managing input data in your RAP, there are three key files:
25
25
@@ -29,7 +29,7 @@ When managing input data in your RAP, there are three key files:
29
29
30
30

31
31
32
-
## 📦 What is included in a RAP?
32
+
## What is included in a RAP?
33
33
34
34
Your reproducible analytical pipeline (RAP) should begin with the **earliest data you access**. This could be:
35
35
@@ -42,7 +42,7 @@ Keep in mind that, especially in sensitive areas like healthcare, you may not be
42
42
43
43
> **Why is this important?** By starting at the source, you make your work transparent and easy to repeat. For instance, if new raw data becomes available, it's important you have your input modelling code so that you can check your distributions are still appropriate, re-estimate your model parameters, and re-run your analysis.
44
44
45
-
## 🗃️ Raw data
45
+
## Raw data
46
46
47
47
This is data which reflects system you will be simulating. It is used to estimate parameters and fit distributions for your simulation model. For example:
48
48
@@ -57,11 +57,11 @@ This is data which reflects system you will be simulating. It is used to estimat
57
57
58
58
:::
59
59
60
-
### 📋 Checklist: Managing your raw data
60
+
### Checklist: Managing your raw data
61
61
62
62
:::{.cream}
63
63
64
-
🗂️ **Always**
64
+
**Always**
65
65
66
66
***Keep copies of your raw data**<br>Or, if you can't export it, document how to access it (e.g. database location, required permissions).
67
67
@@ -71,7 +71,7 @@ This is data which reflects system you will be simulating. It is used to estimat
71
71
72
72
<br>
73
73
74
-
🔓 **If you can share the data:**
74
+
**If you can share the data:**
75
75
76
76
***Make the data openly available**<br>Follow the [FAIR principles]((https://open-science-training-handbook.github.io/Open-Science-Training-Handbook_EN/02OpenScienceBasics/02OpenResearchDataAndMaterials.html)): Findable, Accessible, Interoperable, Reusable.
77
77
@@ -83,7 +83,7 @@ This is data which reflects system you will be simulating. It is used to estimat
83
83
84
84
<br>
85
85
86
-
🔒 **If you cannot share the data:**
86
+
**If you cannot share the data:**
87
87
88
88
***Describe the dataset**<br>Include details in your documentation.
89
89
@@ -141,23 +141,23 @@ Some recommendations for generalist repositories are available:
141
141
142
142
Instructions for Zenodo archiving are provided on our [sharing and archiving](../sharing/archive.qmd) page.
143
143
144
-
## 📜 Input modelling code
144
+
## Input modelling code
145
145
146
146
[Input modelling code](input_modelling.qmd#input-modelling) refers to the scripts used to define and fit the statistical distributions that represent the uncertain inputs for a simulation model.
147
147
148
148
These scripts are often not shared, but are an essential part of your simulation RAP. Sharing them ensures transparency in how distributions were chosen and allows you (or others) to re-run the process if new data or assumptions arise.
149
149
150
-
### 📋 Checklist: Managing your input modelling code
150
+
### Checklist: Managing your input modelling code
151
151
152
152
:::{.cream}
153
153
154
-
🔓 **If you can share the code:**
154
+
**If you can share the code:**
155
155
156
156
***Include the input modelling code in your repository**<br>Store it alongside your simulation code and other relevant scripts.
157
157
158
158
<br>
159
159
160
-
🔒 **If you cannot share the code:**
160
+
**If you cannot share the code:**
161
161
162
162
***For internal use:**
163
163
* Store the code securely and ensure it is accessible to your team or organisation - avoid saving it only on a personal device.
@@ -168,15 +168,15 @@ These scripts are often not shared, but are an essential part of your simulation
168
168
169
169
:::
170
170
171
-
## ⚙️ Parameters
171
+
## Parameters
172
172
173
173
Parameters are the numerical values used in your model, like the arrival rates, service times or probabilities.
174
174
175
-
### 📋 Checklist: Managing your parameters
175
+
### Checklist: Managing your parameters
176
176
177
177
:::{.cream}
178
178
179
-
🗂️ **Always**
179
+
**Always**
180
180
181
181
***Keep a structured parameter file**<br>Store all model parameters in a clearly structured format like a [CSV file](parameters_file.qmd) or a [script](parameters_script.qmd).
182
182
@@ -186,15 +186,15 @@ Parameters are the numerical values used in your model, like the arrival rates,
186
186
187
187
<br>
188
188
189
-
🔓 **If you can share the parameters:**
189
+
**If you can share the parameters:**
190
190
191
191
***Include parameter files in your repository**<br>Store parameter files alongside your model code and documentation.
192
192
193
193
<br>
194
194
195
195
You must share some parameters with your model so that it is possible for others to run it. Parameters are often less sensitive than raw data, so sharing is usually possible. However-
196
196
197
-
🔒 **If you cannot share the parameters:**
197
+
**If you cannot share the parameters:**
198
198
199
199
***Provide synthetic parameters**<br>Supply artifical values for each parameter, clearly labelled as synthetic.
200
200
@@ -204,7 +204,7 @@ You must share some parameters with your model so that it is possible for others
204
204
205
205
:::
206
206
207
-
## 🔐 Maintaining a private and public version of your model
207
+
## Maintaining a private and public version of your model
208
208
209
209
It's common to have data and/or code that cannot be shared publicly. **Both your private and public components should be [version controlled](../setup/version.qmd)**, but you cannot split a single GitHub repository into public and private sections. The suggested solution is to use two separate repositories: **one public, one private**.
210
210
@@ -235,7 +235,7 @@ The way you might set these up depends on whether you are allowed to share the r
235
235
3. Use the shared simulation package in both repositories.
236
236
4. Run and share the full workflow in public with synthetic parameters; run the actual analysis in private with the real parameters.
237
237
238
-
## 🧪 Test yourself
238
+
## Test yourself
239
239
240
240
```{r}
241
241
#| echo: false
@@ -293,7 +293,7 @@ cat(longmcq(c(
293
293
294
294
:::
295
295
296
-
## 📎 Further information
296
+
## Further information
297
297
298
298
*["How to Make a Data Dictionary"](https://help.osf.io/article/217-how-to-make-a-data-dictionary) from OSF Support.
0 commit comments