Skip to content

Commit a2fadc9

Browse files
authored
Merge branch 'arsh-code-refactoring' into main
2 parents b34e36c + 33198ea commit a2fadc9

File tree

12 files changed

+681
-531
lines changed

12 files changed

+681
-531
lines changed

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,6 @@ scripts_outputs/
77
.idea/
88
examples/visualqc_prep/
99
examples/sub-01/
10-
user-testing/sample_dataset/*
11-
user-testing/output/
10+
user-testing/*
11+
.vscode/*
12+
src/isolate_fsleyes_render_issue.py

README.md

Lines changed: 136 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -2,117 +2,182 @@
22

33
# DSST Defacing Pipeline
44

5-
The defacing pipeline for datasets curated by the [Data Science and Sharing Team (DSST)](https://cmn.nimh.nih.gov/dsst) are completed in four steps. Each of these steps is explained in more detail with an example in the next section. The pipeline requires a BIDS dataset as input.
5+
The DSST Defacing Pipeline has been developed to make the process of defacing anatomical scans of large datasets,
6+
visually inspecting for accuracy and fixing scans that fail visual inspection more efficient and straightforward. The
7+
pipeline _requires_ the input dataset to be in BIDS format. A conceptual description of the pipeline can
8+
found [here](#conceptual-design).
69

7-
1. Generate and finalize ["primary" scans](#glossary) to [other scans'](#glossary) mapping file.
8-
2. Deface primary scans
9-
with [@afni_refacer_run](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html) program
10-
developed by the AFNI Team. To deface remaining scans in the session, register them to the primary scan and use
11-
it's defacemask to generate a defaced image.
12-
**NOTE**: If a session doesn't have a T1w scan, then `@afni_refacer_run` is run on all every scan individually.
13-
3. Visually inspect defaced scans with your preferred QC tool.
14-
4. Fix defacings that failed visual inspection.
15-
16-
![Generate and finalize "primary" scans to "secondary" scans mapping file.](images/pipeline_screen_quality.png)
10+
This pipeline is designed and tested to work on the NIH HPC systems. While it's possible to get the pipeline running on
11+
other platforms, please note that it can be error-prone and is not recommended.
1712

18-
## Example
13+
## Usage Instructions
1914

20-
### **Step 0:** Get data and code
21-
Clone this repository to a preferred location on your machine.
15+
### Clone this repository
2216

2317
```bash
2418
git clone [email protected]:nih-fmrif/dsst-defacing-pipeline.git
2519
```
2620

27-
We'll be running the scripts on the [MyConnectome](https://openneuro.org/datasets/ds000031/versions/1.0.0) dataset. The dataset is available for download on OpenNeuro as [ds000031](https://openneuro.org/datasets/ds000031/versions/1.0.0/download).
21+
### Install required packages
2822

29-
```bash
30-
datalad install https://github.com/OpenNeuroDatasets/ds000031.git
31-
```
23+
Apart from AFNI and FSL packages, available as HPC modules, users will need the following packages in their working
24+
environment
3225

33-
Download data in `anat` directories of the dataset.
26+
- VisualQC
27+
- FSLeyes
28+
- Python 3.x
29+
30+
There are many ways to create a virtual environment with the required packages, however, we currently only provide
31+
instructions to create a conda environment. If you don't already have conda installed, please find
32+
instructions [here](https://docs.conda.io/en/latest/miniconda.html). Run the following command to create a conda
33+
environment called `dsstdeface` using the `environment.yml` file from this repo.
3434

3535
```bash
36-
datalad get sub-01/ses-*/anat
36+
conda env create -f environment.yml
3737
```
3838

39-
BIDS tree snippet post-download:
39+
Once conda finishes creating the virtual environment, activate `dsstdeface`.
4040

41-
```bash
42-
$ tree ../datasets/ds000031/
43-
../datasets/ds000031/
44-
├── CHANGES
45-
├── README
46-
├── dataset_description.json
47-
├── events.json
48-
├── participants.json
49-
├── participants.tsv
50-
├── sub-01
51-
│ ├── ses-001
52-
│ │ ├── anat
53-
│ │ │ ├── sub-01_ses-001_T1w.json
54-
│ │ │ └── sub-01_ses-001_T1w.nii.gz
55-
│ │ ├── sub-01_ses-001_scans.json
56-
│ │ └── sub-01_ses-001_scans.tsv
57-
│ ├── ses-003
58-
│ │ ├── anat
59-
│ │ ├── sub-01_ses-003_scans.json
60-
│ │ └── sub-01_ses-003_scans.tsv
61-
...
62-
└── task-spatialwm_events.json
63-
```
41+
```bash
42+
conda activate dsstdeface
43+
```
6444

45+
### Run `dsst_defacing_wf.py`
6546

47+
To deface anatomical scans in the dataset, run `dsst_defacing_wf.py` script.
6648

49+
```
50+
% python src/dsst_defacing_wf.py -h
51+
usage: dsst_defacing_wf.py [-h] --input INPUT --output OUTPUT [--participant-id SUBJ_ID] [--session-id SESS_ID] [--no-clean]
52+
53+
Deface anatomical scans for a given BIDS dataset or a subject directory in BIDS format.
54+
55+
optional arguments:
56+
-h, --help show this help message and exit
57+
--input INPUT, -i INPUT
58+
Path to input BIDS dataset.
59+
--output OUTPUT, -o OUTPUT
60+
Path to output BIDS dataset with defaced scan.
61+
--participant-id SUBJ_ID, -p SUBJ_ID
62+
Subject ID associated with the participant. Since the input dataset is assumed to be BIDS valid, this argument expects subject IDs with 'sub-' prefix.
63+
--session-id SESS_ID, -s SESS_ID
64+
Session ID associated with the subject ID. If the BIDS input dataset contains sessions, then this argument expects session IDs with 'ses-' prefix.
65+
--no-clean If this argument is provided, then AFNI intermediate files are preserved.
6766
67+
```
6868

69-
### **Step 1:** Generate mapping file.
69+
The script can be run serially on a BIDS dataset or in parallel at subject/session level. The three methods of running
70+
the script have been described below with example commands:
7071

71-
a. Generate a mapping file using the `generate_mappings.py` script.
72-
b. Look at your mapping file. Make sure it's not empty. Edit it, if there are any special cases you'd like to account for.
72+
For readability of example commands, the following bash variables have defined as follows:
7373

74-
```
75-
$ python generate_mappings.py -i ../datasets/ds000031 -o ./examples
76-
====================
77-
Dataset Summary
78-
====================
79-
Total number of sessions with 'anat' directory in the dataset: 24
80-
Sessions with 'anat' directory with at least one T1w scan: 22
81-
Sessions without a T1w scan: 2
82-
List of sessions without a T1w scan:
83-
['sub-01/ses-053', 'sub-01/ses-016']
84-
85-
Please find the mapping file in JSON format and other helpful logs at /Users/arshithab/dsst-defacing-pipeline/examples
74+
```bash
75+
INPUT_DIR="<path/to/BIDS/input/dataset>"
76+
OUTPUT_DIR="<path/to/desired/defacing/output/directory>"
8677
```
8778

88-
### **Step 2:** Deface scans
89-
Run `dsst_defacing_wf.py` script that calls on `deface.py` and `register.py` to deface scans in the dataset.
79+
**NOTE:** In the example commands below, `<path/to/BIDS/input/dataset>` and `<path/to/desired/output/directory>` are
80+
placeholders for paths to input and output directories, respectively.
9081

9182
#### Option 1: Serially
83+
9284
If you have a small dataset with less than 10 subjects, then it might be easiest to run the defacing algorithm serially.
9385

9486
```bash
95-
python dsst_defacing_wf.py -i ../datasets/ds000031 -m examples/primary_to_others_mapping.json -o examples
87+
python dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR}
9688
```
9789

98-
#### Option 2: Parallelly
99-
If you have dataset with over 10 subjects, then it might be more practical to run it in parallel. Here's the command one would use to run it on NIH HPC:
90+
#### Option 2: In parallel at subject level
91+
92+
If you have dataset with over 10 subjects, then it might be more practical to run the pipeline in parallel for every
93+
subject in the dataset using the `-p/--participant-id` option as follows:
94+
95+
```bash
96+
python dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -p sub-<index>
97+
```
98+
99+
a. Assuming these scripts are run on the NIH HPC system, the first step would be to create a `swarm` file:
100+
101+
```bash
102+
103+
for i in `ls -d ${INPUT_DIR}/*`; do \
104+
SUBJ=$(echo $i | sed "s|${INPUT_DIR}/||g" ); \
105+
echo "python src/dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -s ${SUBJ}"; \
106+
done > defacing_parallel_subject_level.swarm
107+
```
108+
109+
Purpose: Loop through the dataset and find all subject directories to construct `dsst_defacing_wf.py` command
110+
with `-p/--participant-id` option.
111+
112+
b. Run the swarm file with following command to start a swarm job
113+
114+
```bash
115+
swarm -f defacing_parallel_subject_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
116+
```
117+
118+
#### Option 3: In parallel at session level
119+
120+
If the input dataset has multiple sessions per subject, then run the pipeline on every session in the dataset
121+
parallelly. Similar to Option 2, the following commands loop through the dataset to find subject and session IDs to
122+
create a `swarm` file to be run on NIH HPC systems.
100123

101124
```bash
102-
for i in `ls -d ../datasets/toy/*`; do SUBJ=$(echo $i | sed 's|../datasets/toy/||g' ); echo "python dsst_defacing_wf.py -i ../datasets/ds000031 -m examples/primary_to_others_mapping.json -o examples -s $SUBJ"; done > ./examples/defacing_parallel.swarm
103-
swarm -f ./examples/defacing_parallel.swarm --module afni,fsl --merge-output --logdir ./examples/swarm_log
125+
for i in `ls -d ${INPUT_DIR}/*`; do
126+
SUBJ=$(echo $i | sed "s|${INPUT_DIR}/||g" );
127+
for j in `ls -d ${INPUT_DIR}/${SUBJ}/*`; do
128+
SESS=$(echo $j | sed "s|${INPUT_DIR}/${SUBJ}/||g" )
129+
echo "python src/dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -p ${SUBJ} -s ${SESS}";
130+
done;
131+
done > defacing_parallel_session_level.swarm
104132
```
105133

106-
### **Step 3:** Visually QC defaced scans.
134+
```bash
135+
swarm -f defacing_parallel_session_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
136+
```
137+
138+
### Run `generate_renders.py`
139+
140+
Generate 3D renders for every defaced image in the output directory.
141+
142+
```bash
143+
python src/generate_renders.py -o ${OUTPUT_DIR}
144+
```
145+
146+
### Visual Inspection
147+
148+
To visually inspect quality of defacing with [VisualQC](https://raamana.github.io/visualqc/readme.html), we'll need to:
149+
150+
1. Open TurboVNC through an spersist session. More info [here](https://hpc.nih.gov/docs/nimh.html).
151+
2. Run the `vqcdeface` command from a command-line terminal within a TurboVNC instance
152+
153+
```bash
154+
sh ${OUTPUT_DIR}/QC_prep/defacing_qc_cmd
155+
```
156+
157+
## Conceptual design
158+
159+
1. Generate a ["primary" scans](#terminology) to [other scans'](#terminology) mapping file.
160+
2. Deface primary scans
161+
with [@afni_refacer_run](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html) program
162+
developed by the AFNI Team.
163+
3. To deface remaining scans in the session, register them to the primary scan (using FSL `flirt` command) and then use
164+
the primary scan's defacemask to generate a defaced image (using `fslmaths` command).
165+
4. Visually inspect defaced scans with [VisualQC](https://raamana.github.io/visualqc) deface tool or any other preferred
166+
tool.
167+
5. Correct/fix defaced scans that failed visual inspection. See [here]() for more info on types of failures.
168+
169+
![Defacing Pipeline flowchart](images/defacing_pipeline.png)
107170

108-
Visual QC defacing accuracy gallery https://raamana.github.io/visualqc/gallery_defacing.html
171+
## Terminology
109172

110-
## Glossary
173+
While describing the process, we frequently use the following terms:
111174

112175
- **Primary Scan:** The best quality T1w scan within a session. For programmatic selection, we assume that the most
113176
recently acquired T1w scan is of the best quality.
114177
- **Other/Secondary Scans:** All scans *except* the primary scan are grouped together and referred to as "other" or "
115178
secondary" scans for a given session.
179+
- **Mapping File:** A JSON file that assigns maps a primary scan (or `primary_t1`) to all other scans within a session.
180+
Please find an example file [here]().
116181
- **[VisualQC](https://raamana.github.io/visualqc):** A suite of QC tools developed by Pradeep Raamana (Assistant
117182
Professor at University of Pittsburgh).
118183

@@ -122,7 +187,8 @@ Visual QC defacing accuracy gallery https://raamana.github.io/visualqc/gallery_d
122187
BN, Milev R, Müller DJ, Kennedy SH, Scott CJM, Strother SC, and Arnott SR (2021)
123188
[Multisite Comparison of MRI Defacing Software Across Multiple Cohorts](10.3389/fpsyt.2021.617997). Front. Psychiatry
124189
12:617997. doi:10.3389/fpsyt.2021.617997
125-
2. `@afni_refacer_run` is the defacing tool used under the hood. [AFNI Refacer program](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html).
190+
2. `@afni_refacer_run` is the defacing tool used under the
191+
hood. [AFNI Refacer program](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html).
126192
3. FSL's [FLIRT](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT)
127193
and [`fslmaths`](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Fslutils?highlight=%28fslmaths%29) programs have been used
128194
for registration and masking steps in the workflow.

deface.py

Lines changed: 0 additions & 127 deletions
This file was deleted.

0 commit comments

Comments
 (0)