Skip to content

Commit 70b1b7b

Browse files
authored
Merge pull request #95 from pythonhealthdatascience/dev
Dev
2 parents ca305e3 + a1386ea commit 70b1b7b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+3031
-624
lines changed

.flake8

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
[flake8]
22
per-file-ignores =
33
docstrings.py: F811
4+
outputs.py: F811
5+
parallel.py: F401
46
parameters_file.py: E402,F811,E0102
5-
parameters_validation.py: F821
7+
parameters_validation.py: F821
8+
replications.py: F401,F811,F821
9+
pages/output_analysis/outputs_resources/*.py:E261,E262,F821
10+
pages/output_analysis/replications_resources/*.py:E261,E262,F821
11+
pages/inputs/parameters_validation_resources/ParamClass.py: C0103

.lintr

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
11
linters: all_linters(packages = "lintr", undesirable_function_linter = NULL)
22
encoding: "UTF-8"
3-
exclusions: list("renv")
3+
exclusions: list(
4+
"pages/inputs/parameters_validation.qmd" = list(
5+
object_usage_linter = 771:772
6+
),
7+
"pages/output_analysis/n_reps.qmd" = list(
8+
unused_import_linter = Inf,
9+
object_usage_linter = Inf
10+
),
11+
"pages/output_analysis/outputs.qmd" = list(
12+
one_call_pipe_linter = 898,
13+
line_length_linter = 2828
14+
),
15+
"pages/output_analysis/parallel.qmd" = list(
16+
one_call_pipe_linter = 812
17+
),
18+
"pages/output_analysis/outputs_resources/model.R" = list(
19+
object_usage_linter = Inf
20+
),
21+
"pages/output_analysis/replications.qmd" = list(
22+
unused_import_linter = Inf
23+
),
24+
"pages/output_analysis/replications_resources" = list(
25+
object_usage_linter = Inf
26+
),
27+
"pages/style_docs/linting_resources/code.R",
28+
"renv"
29+
)

.pre-commit-config.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
repos:
2+
- repo: local
3+
hooks:
4+
- id: quarto-r-include-check
5+
name: Block R Quarto file includes which break lintr
6+
entry: .pre-commit-hooks/check-no-quarto-r-include.sh
7+
language: script
8+
files: \.qmd$
9+
- repo: https://github.com/lorenzwalthert/precommit
10+
rev: v0.4.3
11+
hooks:
12+
- id: lintr
13+
args: [--warn_only]
14+
verbose: true
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#!/usr/bin/env bash
2+
3+
# Find staged .qmd files
4+
FILES=$(git diff --cached --name-only | grep '\.qmd$')
5+
ERROR=0
6+
7+
for FILE in $FILES; do
8+
# Detect presence of Quarto include lines
9+
if grep -q '{{< *include *.*\.R *>}}' "$FILE"; then
10+
echo "ERROR: $FILE contains '{{< include ... .R >}}'."
11+
echo "Please use '#| file: filename.R' in code chunk options instead."
12+
ERROR=1
13+
fi
14+
done
15+
16+
if [ $ERROR -eq 1 ]; then
17+
echo "Commit blocked: Replace '{{< include ... .R >}}' with Quarto chunk option '#| file: filename.R'."
18+
exit 1
19+
fi
20+
21+
exit 0

.pylintrc

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,13 @@
22
max-line-length=79
33

44
[MESSAGES CONTROL]
5-
disable=too-many-lines
5+
disable =
6+
duplicate-code,
7+
function-redefined,
8+
missing-module-docstring,
9+
redefined-outer-name,
10+
too-few-public-methods,
11+
too-many-arguments,
12+
too-many-instance-attributes,
13+
too-many-lines,
14+
too-many-positional-arguments

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,24 @@ Note: inactive code (i.e. code that does not get run when building the book) wil
154154

155155
<br>
156156

157+
## Pre-commit
158+
159+
To activate the pre-commit hook...
160+
161+
1. Make the bash script executable - from command line, run:
162+
163+
```{.bash}
164+
chmod +x .pre-commit-hooks/check-no-quarto-r-include.sh
165+
```
166+
167+
2. Run the following from your python environment on the command line:
168+
169+
```{.python}
170+
pre-commit install
171+
```
172+
173+
<br>
174+
157175
## Funding
158176

159177
This project is supported by the Medical Research Council [grant number [MR/Z503915/1](https://gtr.ukri.org/projects?ref=MR%2FZ503915%2F1)].

environment.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ dependencies:
99
- pandas=2.3.1
1010
- plotly=6.3.0
1111
- pip
12+
- pre-commit=4.3.0
1213
- pylint=3.3.7
1314
- pytest=8.4.1
1415
- python=3.11

lint.sh

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,16 @@ print_section() {
66
echo "--------------------------------------------------------------------"
77
}
88

9-
# Note: I have used ```{r} #| file: file.R``` instead of
10-
# ```{r}{{< include file.R >}}```, and likewise for python, as the latter
11-
# breaks lintr (false positive messages, and missing other messages) and breaks
12-
# pylint (returns an error Parsing failed: 'invalid syntax'). It doesn't break
9+
# Note: For R, I have used ```{r} #| file: file.R``` instead of
10+
# ```{r}{{< include file.R >}}```, as the latter breaks lintr (false positive
11+
# messages, and missing other messages) and breaks. It doesn't break
1312
# if used in non-active code chunks as linters ignore those.
1413

1514
print_section "R" "index.qmd"
1615
Rscript -e 'lintr::lint("index.qmd")'
1716

1817
print_section "R" "pages/"
19-
Rscript -e 'lintr::lint_dir("pages", exclusions = list("style_docs/linting_resources/code.R"))'
18+
Rscript -e 'lintr::lint_dir("pages")'
2019

2120
print_section "R" "tests/"
2221
Rscript -e 'lintr::lint_dir("tests")'
@@ -26,6 +25,9 @@ echo "--------------------------------------------------------------------"
2625
print_section "python" "index.qmd and pages/"
2726
lintquarto -l pylint flake8 -p index.qmd pages/
2827

29-
print_section "python" "tests/"
30-
pylint pages tests --ignore=linting_resources
31-
flake8 pages tests --exclude linting_resources
28+
print_section "python" "pages/ and tests/"
29+
30+
pylint pages tests --ignore=linting_resources,outputs_resources,replications_resources
31+
pylint pages/output_analysis/outputs_resources pages/output_analysis/replications_resources --disable=missing-module-docstring,undefined-variable
32+
33+
flake8 pages tests --exclude linting_resources,replications_resources

pages/inputs/input_data.qmd

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ title: Input data management
1919

2020
:::
2121

22-
## 🧾 Input data
22+
## Input data
2323

2424
When managing input data in your RAP, there are three key files:
2525

@@ -29,7 +29,7 @@ When managing input data in your RAP, there are three key files:
2929

3030
![](input_data_resources/input_files.png)
3131

32-
## 📦 What is included in a RAP?
32+
## What is included in a RAP?
3333

3434
Your reproducible analytical pipeline (RAP) should begin with the **earliest data you access**. This could be:
3535

@@ -42,7 +42,7 @@ Keep in mind that, especially in sensitive areas like healthcare, you may not be
4242

4343
> **Why is this important?** By starting at the source, you make your work transparent and easy to repeat. For instance, if new raw data becomes available, it's important you have your input modelling code so that you can check your distributions are still appropriate, re-estimate your model parameters, and re-run your analysis.
4444
45-
## 🗃️ Raw data
45+
## Raw data
4646

4747
This is data which reflects system you will be simulating. It is used to estimate parameters and fit distributions for your simulation model. For example:
4848

@@ -57,11 +57,11 @@ This is data which reflects system you will be simulating. It is used to estimat
5757

5858
:::
5959

60-
### 📋 Checklist: Managing your raw data
60+
### Checklist: Managing your raw data
6161

6262
:::{.cream}
6363

64-
🗂️ **Always**
64+
**Always**
6565

6666
* **Keep copies of your raw data**<br>Or, if you can't export it, document how to access it (e.g. database location, required permissions).
6767

@@ -71,7 +71,7 @@ This is data which reflects system you will be simulating. It is used to estimat
7171

7272
<br>
7373

74-
🔓 **If you can share the data:**
74+
**If you can share the data:**
7575

7676
* **Make the data openly available**<br>Follow the [FAIR principles]((https://open-science-training-handbook.github.io/Open-Science-Training-Handbook_EN/02OpenScienceBasics/02OpenResearchDataAndMaterials.html)): Findable, Accessible, Interoperable, Reusable.
7777

@@ -83,7 +83,7 @@ This is data which reflects system you will be simulating. It is used to estimat
8383

8484
<br>
8585

86-
🔒 **If you cannot share the data:**
86+
**If you cannot share the data:**
8787

8888
* **Describe the dataset**<br>Include details in your documentation.
8989

@@ -141,23 +141,23 @@ Some recommendations for generalist repositories are available:
141141

142142
Instructions for Zenodo archiving are provided on our [sharing and archiving](../sharing/archive.qmd) page.
143143

144-
## 📜 Input modelling code
144+
## Input modelling code
145145

146146
[Input modelling code](input_modelling.qmd#input-modelling) refers to the scripts used to define and fit the statistical distributions that represent the uncertain inputs for a simulation model.
147147

148148
These scripts are often not shared, but are an essential part of your simulation RAP. Sharing them ensures transparency in how distributions were chosen and allows you (or others) to re-run the process if new data or assumptions arise.
149149

150-
### 📋 Checklist: Managing your input modelling code
150+
### Checklist: Managing your input modelling code
151151

152152
:::{.cream}
153153

154-
🔓 **If you can share the code:**
154+
**If you can share the code:**
155155

156156
* **Include the input modelling code in your repository**<br>Store it alongside your simulation code and other relevant scripts.
157157

158158
<br>
159159

160-
🔒 **If you cannot share the code:**
160+
**If you cannot share the code:**
161161

162162
* **For internal use:**
163163
* Store the code securely and ensure it is accessible to your team or organisation - avoid saving it only on a personal device.
@@ -168,15 +168,15 @@ These scripts are often not shared, but are an essential part of your simulation
168168

169169
:::
170170

171-
## ⚙️ Parameters
171+
## Parameters
172172

173173
Parameters are the numerical values used in your model, like the arrival rates, service times or probabilities.
174174

175-
### 📋 Checklist: Managing your parameters
175+
### Checklist: Managing your parameters
176176

177177
:::{.cream}
178178

179-
🗂️ **Always**
179+
**Always**
180180

181181
* **Keep a structured parameter file**<br>Store all model parameters in a clearly structured format like a [CSV file](parameters_file.qmd) or a [script](parameters_script.qmd).
182182

@@ -186,15 +186,15 @@ Parameters are the numerical values used in your model, like the arrival rates,
186186

187187
<br>
188188

189-
🔓 **If you can share the parameters:**
189+
**If you can share the parameters:**
190190

191191
* **Include parameter files in your repository**<br>Store parameter files alongside your model code and documentation.
192192

193193
<br>
194194

195195
You must share some parameters with your model so that it is possible for others to run it. Parameters are often less sensitive than raw data, so sharing is usually possible. However-
196196

197-
🔒 **If you cannot share the parameters:**
197+
**If you cannot share the parameters:**
198198

199199
* **Provide synthetic parameters**<br>Supply artifical values for each parameter, clearly labelled as synthetic.
200200

@@ -204,7 +204,7 @@ You must share some parameters with your model so that it is possible for others
204204

205205
:::
206206

207-
## 🔐 Maintaining a private and public version of your model
207+
## Maintaining a private and public version of your model
208208

209209
It's common to have data and/or code that cannot be shared publicly. **Both your private and public components should be [version controlled](../setup/version.qmd)**, but you cannot split a single GitHub repository into public and private sections. The suggested solution is to use two separate repositories: **one public, one private**.
210210

@@ -235,7 +235,7 @@ The way you might set these up depends on whether you are allowed to share the r
235235
3. Use the shared simulation package in both repositories.
236236
4. Run and share the full workflow in public with synthetic parameters; run the actual analysis in private with the real parameters.
237237

238-
## 🧪 Test yourself
238+
## Test yourself
239239

240240
```{r}
241241
#| echo: false
@@ -293,7 +293,7 @@ cat(longmcq(c(
293293

294294
:::
295295

296-
## 📎 Further information
296+
## Further information
297297

298298
* ["How to Make a Data Dictionary"](https://help.osf.io/article/217-how-to-make-a-data-dictionary) from OSF Support.
299299

0 commit comments

Comments
 (0)