README update

Arshitha · Arshitha · commit 05a8050c0b1d · 2023-04-10T16:13:09.000-04:00
diff --git a/README.md b/README.md
@@ -2,71 +2,113 @@
 
 # DSST Defacing Pipeline
 
-The defacing pipeline for datasets curated by the [Data Science and Sharing Team (DSST)](https://cmn.nimh.nih.gov/dsst) are completed in four steps. Each of these steps is explained in more detail with an example in the next section. The pipeline requires a BIDS dataset as input.
+The DSST Defacing Pipeline has been developed to make the process of defacing anatomical scans of large datasets, visually inspecting for accuracy and fixing scans that fail visual inspection more efficient and straightforward. The pipeline _requires_ the input dataset to be in BIDS format. A conceptual description of the pipeline can found [here](#conceptual-design).
 
-1. Generate and finalize ["primary" scans](#glossary) to [other scans'](#glossary) mapping file. 
-2. Deface primary scans
-   with [@afni_refacer_run](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html) program
-   developed by the AFNI Team. To deface remaining scans in the session, register them to the primary scan and use
-   it's defacemask to generate a defaced image.
-    **NOTE**: If a session doesn't have a T1w scan, then `@afni_refacer_run` is run on all every scan individually. 
-3. Visually inspect defaced scans with your preferred QC tool. 
-4. Fix defacings that failed visual inspection.
+## Usage Instructions
+### Clone this repository
+```bash
+git clone git@github.com:nih-fmrif/dsst-defacing-pipeline.git
+```
+### Run `dsst_defacing_wf.py`
+To deface anatomical scans in the dataset, run `dsst_defacing_wf.py` script.
+```
+% python src/dsst_defacing_wf.py -h                                                                                            
+usage: dsst_defacing_wf.py [-h] --input INPUT --output OUTPUT [--participant-id SUBJ_ID] [--session-id SESS_ID] [--no-clean]
+
+Deface anatomical scans for a given BIDS dataset or a subject directory in BIDS format.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --input INPUT, -i INPUT
+                        Path to input BIDS dataset.
+  --output OUTPUT, -o OUTPUT
+                        Path to output BIDS dataset with defaced scan.
+  --participant-id SUBJ_ID, -p SUBJ_ID
+                        Subject ID associated with the participant. Since the input dataset is assumed to be BIDS valid, this argument expects subject IDs with 'sub-' prefix.
+  --session-id SESS_ID, -s SESS_ID
+                        Session ID associated with the subject ID. If the BIDS input dataset contains sessions, then this argument expects session IDs with 'ses-' prefix.
+  --no-clean            If this argument is provided, then AFNI intermediate files are preserved.
 
-![Generate and finalize "primary" scans to "secondary" scans mapping file.](images/pipeline_screen_quality.png)
+```
 
-## Example
+The script can be run serially on a BIDS dataset or in parallel at subject/session level. The three methods of running the script have been described below with example commands:  
 
-### **Step 0:** Get data and code
-Clone this repository to a preferred location on your machine.
+#### Option 1: Serially
+If you have a small dataset with less than 10 subjects, then it might be easiest to run the defacing algorithm serially.
 
 ```bash
-git clone git@github.com:nih-fmrif/dsst-defacing-pipeline.git
+python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/output/directory>
 ```
 
-We'll be running the scripts on the [MyConnectome](https://openneuro.org/datasets/ds000031/versions/1.0.0) dataset. The dataset is available for download on OpenNeuro as [ds000031](https://openneuro.org/datasets/ds000031/versions/1.0.0/download). 
+#### Option 2: In parallel at subject level
+If you have dataset with over 10 subjects, then it might be more practical to run the pipeline in parallel for every subject in the dataset using the `-p/--participant-id` option as follows:
 
 ```bash
-datalad install https://github.com/OpenNeuroDatasets/ds000031.git
+python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/output/directory> -p sub-<index>
 ```
 
-Download data in `anat` directories of the dataset.
+  a. Assuming these scripts are run on the NIH HPC system, the first step would be to create a `swarm` file:
 
-```bash
-datalad get sub-01/ses-*/anat
-```
+  ```bash
+  for i in `ls -d <path/to/BIDS/input/dataset>*`; do \
+    SUBJ=$(echo $i | sed 's|<path/to/BIDS/input/dataset>||g' ); \
+    echo "python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/output/directory> -s $SUBJ"; \
+    done > defacing_parallel_subject_level.swarm
+  ```
+  Purpose: Loop through the dataset and find all subject directories to construct `dsst_defacing_wf.py` command with `-p/--participant-id` option. 
 
-### **Step 1:** Deface scans
-Run `dsst_defacing_wf.py` script that calls on `deface.py` and `register.py` to deface scans in the dataset. 
+  b. Run the swarm file with following command to start a swarm job
+  ```bash
+  swarm -f .defacing_parallel_subject_level.swarm --module afni,fsl --merge-output --logdir swarm_log
+  ```
 
-#### Option 1: Serially
-If you have a small dataset with less than 10 subjects, then it might be easiest to run the defacing algorithm serially.
+#### Option 3: In parallel at session level
+If the input dataset has multiple sessions per subject, then run the pipeline on every session in the dataset parallelly. Similar to Option 2, the following commands loop through the dataset to find subject and session IDs to create a `swarm` file to be run on NIH HPC systems.
 
 ```bash
-python dsst_defacing_wf.py -i ../datasets/ds000031 -o examples
+for i in `ls -d <path/to/BIDS/input/dataset>*`; do \
+  SUBJ=$(echo $i | sed 's|<path/to/BIDS/input/dataset>||g' ); \
+  echo "python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/output/directory> -s $SUBJ"; \
+  done > defacing_parallel_subject_level.swarm
+```
+```bash
+swarm -f .defacing_parallel_subject_level.swarm --module afni,fsl --merge-output --logdir swarm_log
 ```
 
-#### Option 2: Parallelly
-If you have dataset with over 10 subjects, then it might be more practical to run it in parallel. Here's the command one would use to run it on NIH HPC:
+### Visually inspect defaced scans using VisualQC
+
+Pre-requisite: Install VisualQC from https://raamana.github.io/visualqc/installation.html#stable-release[](https://raamana.github.io/visualqc/installation.html#stable-release)
 
+Once VisualQC is installed, please run the following command to open VisualQC deface GUI to start visually inspecting defaced scans:
 ```bash
-for i in `ls -d ../datasets/toy/*`; do SUBJ=$(echo $i | sed 's|../datasets/toy/||g' ); echo "python dsst_defacing_wf.py -i ../datasets/ds000031 -m examples/primary_to_others_mapping.json -o examples -s $SUBJ"; done > ./examples/defacing_parallel.swarm
-swarm -f ./examples/defacing_parallel.swarm --module afni,fsl --merge-output --logdir ./examples/swarm_log
+sh <path/to/desired/output/directory>/visualqc_prep/defacing_qc_cmd
 ```
 
-### **Step 2:** Visually QC defaced scans.
-
 Visual QC defacing accuracy gallery https://raamana.github.io/visualqc/gallery_defacing.html
 
-## Glossary
+
+## Terminology
+While describing the process, we frequently use the following terms: 
 
 - **Primary Scan:** The best quality T1w scan within a session. For programmatic selection, we assume that the most
   recently acquired T1w scan is of the best quality.
 - **Other/Secondary Scans:** All scans *except* the primary scan are grouped together and referred to as "other" or "
   secondary" scans for a given session.
+- **Mapping File:** A JSON file that assigns maps a primary scan (or `primary_t1`) to all other scans within a session. Please find an example file [here]().
 - **[VisualQC](https://raamana.github.io/visualqc):** A suite of QC tools developed by Pradeep Raamana (Assistant
   Professor at University of Pittsburgh).
 
+## Conceptual design
+1. Generate a ["primary" scans](#terminology) to [other scans'](#terminology) mapping file. 
+2. Deface primary scans
+   with [@afni_refacer_run](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html) program
+   developed by the AFNI Team. 
+3. To deface remaining scans in the session, register them to the primary scan (using FSL `flirt` command) and then use the primary scan's defacemask to generate a defaced image (using `fslmaths` command).
+4. Visually inspect defaced scans with [VisualQC](https://raamana.github.io/visualqc) deface tool or any other preferred tool.
+5. Correct/fix defaced scans that failed visual inspection. See [here]() for more info on types of failures.
+
+![Defacing Pipeline flowchart](images/pipeline_screen_quality.png)
+
 ## References
 
 1. Theyers AE, Zamyadi M, O'Reilly M, Bartha R, Symons S, MacQueen GM, Hassel S, Lerch JP, Anagnostou E, Lam RW, Frey