major changes

Arshitha · Arshitha · commit 7a87d974336c · 2023-04-19T13:48:09.000-04:00
diff --git a/README.md b/README.md
@@ -2,15 +2,23 @@
 
 # DSST Defacing Pipeline
 
-The DSST Defacing Pipeline has been developed to make the process of defacing anatomical scans of large datasets, visually inspecting for accuracy and fixing scans that fail visual inspection more efficient and straightforward. The pipeline _requires_ the input dataset to be in BIDS format. A conceptual description of the pipeline can found [here](#conceptual-design).
+The DSST Defacing Pipeline has been developed to make the process of defacing anatomical scans of large datasets,
+visually inspecting for accuracy and fixing scans that fail visual inspection more efficient and straightforward. The
+pipeline _requires_ the input dataset to be in BIDS format. A conceptual description of the pipeline can
+found [here](#conceptual-design).
 
 ## Usage Instructions
+
 ### Clone this repository
+
 ```bash
 git clone git@github.com:nih-fmrif/dsst-defacing-pipeline.git
 ```
+
 ### Run `dsst_defacing_wf.py`
+
 To deface anatomical scans in the dataset, run `dsst_defacing_wf.py` script.
+
 ```
 % python src/dsst_defacing_wf.py -h                                                                                            
 usage: dsst_defacing_wf.py [-h] --input INPUT --output OUTPUT [--participant-id SUBJ_ID] [--session-id SESS_ID] [--no-clean]
@@ -31,85 +39,112 @@ optional arguments:
 
 ```
 
-The script can be run serially on a BIDS dataset or in parallel at subject/session level. The three methods of running the script have been described below with example commands:  
+The script can be run serially on a BIDS dataset or in parallel at subject/session level. The three methods of running
+the script have been described below with example commands:
 
-**NOTE:** In the example commands below, <path/to/BIDS/input/dataset> and <path/to/desired/output/directory> are placeholders for paths to input and output directories respectively. 
+For readability of example commands, the following bash variables have defined as follows:
+
+```bash
+INPUT_DIR="<path/to/BIDS/input/dataset>"
+OUTPUT_DIR="<path/to/desired/defacing/output/directory>"
+```
+
+**NOTE:** In the example commands below, <path/to/BIDS/input/dataset> and <path/to/desired/output/directory> are
+placeholders for paths to input and output directories respectively.
 
 #### Option 1: Serially
+
 If you have a small dataset with less than 10 subjects, then it might be easiest to run the defacing algorithm serially.
 
 ```bash
-python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/output/directory>
+python dsst_defacing_wf.py -i $INPUT_DIR -o $OUTPUT_DIR
 ```
 
 #### Option 2: In parallel at subject level
-If you have dataset with over 10 subjects, then it might be more practical to run the pipeline in parallel for every subject in the dataset using the `-p/--participant-id` option as follows:
+
+If you have dataset with over 10 subjects, then it might be more practical to run the pipeline in parallel for every
+subject in the dataset using the `-p/--participant-id` option as follows:
 
 ```bash
-python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/defacing/output/directory> -p sub-<index>
+python dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -p sub-<index>
 ```
 
-  a. Assuming these scripts are run on the NIH HPC system, the first step would be to create a `swarm` file:
+a. Assuming these scripts are run on the NIH HPC system, the first step would be to create a `swarm` file:
 
   ```bash
-  for i in `ls -d <path/to/BIDS/input/dataset>*`; do \
-    SUBJ=$(echo $i | sed 's|<path/to/BIDS/input/dataset>||g' ); \
-    echo "python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/defacing/output/directory> -s $SUBJ"; \
+  
+  for i in `ls -d ${INPUT_DIR}/*`; do \
+    SUBJ=$(echo $i | sed "s|${INPUT_DIR}/||g" ); \
+    echo "python src/dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -s ${SUBJ}"; \
     done > defacing_parallel_subject_level.swarm
   ```
-  Purpose: Loop through the dataset and find all subject directories to construct `dsst_defacing_wf.py` command with `-p/--participant-id` option. 
 
-  b. Run the swarm file with following command to start a swarm job
+Purpose: Loop through the dataset and find all subject directories to construct `dsst_defacing_wf.py` command
+with `-p/--participant-id` option.
+
+b. Run the swarm file with following command to start a swarm job
+
   ```bash
-  swarm -f defacing_parallel_subject_level.swarm --module afni,fsl --merge-output --logdir swarm_log
+  swarm -f defacing_parallel_subject_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
   ```
 
 #### Option 3: In parallel at session level
-If the input dataset has multiple sessions per subject, then run the pipeline on every session in the dataset parallelly. Similar to Option 2, the following commands loop through the dataset to find subject and session IDs to create a `swarm` file to be run on NIH HPC systems.
+
+If the input dataset has multiple sessions per subject, then run the pipeline on every session in the dataset
+parallelly. Similar to Option 2, the following commands loop through the dataset to find subject and session IDs to
+create a `swarm` file to be run on NIH HPC systems.
 
 ```bash
-for i in `ls -d <path/to/BIDS/input/dataset>*`; do
-  SUBJ=$(echo $i | sed "s|<path/to/BIDS/input/dataset>/||g" );
-  for j in `ls -d <path/to/BIDS/input/dataset>/${SUBJ}/*`; do
-    SESS=$(echo $j | sed "s|<path/to/BIDS/input/dataset>/${SUBJ}/||g" )
-    echo "python dsst_defacing_wf.py -i <path/to/BIDS/input/dataset> -o <path/to/desired/defacing/output/directory> -p ${SUBJ} -s ${SESS}";
+for i in `ls -d ${INPUT_DIR}/*`; do
+  SUBJ=$(echo $i | sed "s|${INPUT_DIR}/||g" );
+  for j in `ls -d ${INPUT_DIR}/${SUBJ}/*`; do
+    SESS=$(echo $j | sed "s|${INPUT_DIR}/${SUBJ}/||g" )
+    echo "python src/dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -p ${SUBJ} -s ${SESS}";
     done;
   done > defacing_parallel_session_level.swarm
 ```
+
 ```bash
-swarm -f defacing_parallel_session_level.swarm --module afni,fsl --merge-output --logdir swarm_log
+swarm -f defacing_parallel_session_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
 ```
 
 ### Visually inspect defaced scans using VisualQC
 
-Pre-requisite: Install VisualQC from https://raamana.github.io/visualqc/installation.html#stable-release[](https://raamana.github.io/visualqc/installation.html#stable-release)
+Pre-requisite: Install VisualQC
+from https://raamana.github.io/visualqc/installation.html#stable-release[](https://raamana.github.io/visualqc/installation.html#stable-release)
+
+Once VisualQC is installed, please run the following command to open VisualQC deface GUI to start visually inspecting
+defaced scans:
 
-Once VisualQC is installed, please run the following command to open VisualQC deface GUI to start visually inspecting defaced scans:
 ```bash
 sh <path/to/defacing/output/directory>/visualqc_prep/defacing_qc_cmd
 ```
 
 Visual QC defacing accuracy gallery https://raamana.github.io/visualqc/gallery_defacing.html
 
-
 ## Terminology
-While describing the process, we frequently use the following terms: 
+
+While describing the process, we frequently use the following terms:
 
 - **Primary Scan:** The best quality T1w scan within a session. For programmatic selection, we assume that the most
   recently acquired T1w scan is of the best quality.
 - **Other/Secondary Scans:** All scans *except* the primary scan are grouped together and referred to as "other" or "
   secondary" scans for a given session.
-- **Mapping File:** A JSON file that assigns maps a primary scan (or `primary_t1`) to all other scans within a session. Please find an example file [here]().
+- **Mapping File:** A JSON file that assigns maps a primary scan (or `primary_t1`) to all other scans within a session.
+  Please find an example file [here]().
 - **[VisualQC](https://raamana.github.io/visualqc):** A suite of QC tools developed by Pradeep Raamana (Assistant
   Professor at University of Pittsburgh).
 
 ## Conceptual design
-1. Generate a ["primary" scans](#terminology) to [other scans'](#terminology) mapping file. 
+
+1. Generate a ["primary" scans](#terminology) to [other scans'](#terminology) mapping file.
 2. Deface primary scans
    with [@afni_refacer_run](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html) program
-   developed by the AFNI Team. 
-3. To deface remaining scans in the session, register them to the primary scan (using FSL `flirt` command) and then use the primary scan's defacemask to generate a defaced image (using `fslmaths` command).
-4. Visually inspect defaced scans with [VisualQC](https://raamana.github.io/visualqc) deface tool or any other preferred tool.
+   developed by the AFNI Team.
+3. To deface remaining scans in the session, register them to the primary scan (using FSL `flirt` command) and then use
+   the primary scan's defacemask to generate a defaced image (using `fslmaths` command).
+4. Visually inspect defaced scans with [VisualQC](https://raamana.github.io/visualqc) deface tool or any other preferred
+   tool.
 5. Correct/fix defaced scans that failed visual inspection. See [here]() for more info on types of failures.
 
 ![Defacing Pipeline flowchart](images/pipeline_screen_quality.png)
@@ -120,7 +155,8 @@ While describing the process, we frequently use the following terms:
    BN, Milev R, Müller DJ, Kennedy SH, Scott CJM, Strother SC, and Arnott SR (2021)
    [Multisite Comparison of MRI Defacing Software Across Multiple Cohorts](10.3389/fpsyt.2021.617997). Front. Psychiatry
    12:617997. doi:10.3389/fpsyt.2021.617997
-2. `@afni_refacer_run` is the defacing tool used under the hood. [AFNI Refacer program](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html).
+2. `@afni_refacer_run` is the defacing tool used under the
+   hood. [AFNI Refacer program](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html).
 3. FSL's [FLIRT](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT)
    and [`fslmaths`](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Fslutils?highlight=%28fslmaths%29) programs have been used
    for registration and masking steps in the workflow.
diff --git a/src/deface.py b/src/deface.py
@@ -68,14 +68,33 @@ def copy_over_sidecar(scan_filepath, input_anat_dir, output_anat_dir):
     shutil.copy2(json_sidecar, output_anat_dir / filename)
 
 
-def reorganize_into_bids(input_bids_dir, subj_dir, sess_dir, primary_t1, defacing_outdir, no_clean):
+def vqcdeface_prep(bids_input_dir, defaced_anat_dir, bids_defaced_outdir):
+    defacing_qc_dir = bids_defaced_outdir.parent / 'QC_prep' / 'defacing_QC'
+    interested_files = [f for f in defaced_anat_dir.rglob('*.nii.gz') if
+                        'work_dir' not in str(f).split('/')]
+    print(interested_files)
+    for defaced_img in interested_files:
+        entities = defaced_img.name.split('.')[0].split('_')
+        vqcd_subj_dir = defacing_qc_dir / f"{'/'.join(entities)}"
+        vqcd_subj_dir.mkdir(parents=True, exist_ok=True)
+
+        defaced_link = vqcd_subj_dir / 'defaced.nii.gz'
+        if not defaced_link.is_symlink():
+            defaced_link.symlink_to(defaced_img)
+        print(list(bids_input_dir.rglob(defaced_img.name)))
+        img = list(bids_input_dir.rglob(defaced_img.name))[0]
+        img_link = vqcd_subj_dir / 'orig.nii.gz'
+        if not img_link.is_symlink(): img_link.symlink_to(img)
+
+
+def reorganize_into_bids(input_bids_dir, subj_dir, sess_dir, primary_t1, bids_defaced_outdir, no_clean):
     subj_id = subj_dir.name
     sess_id = sess_dir.name if sess_dir else None
 
     if sess_id:
-        anat_dirs = list(defacing_outdir.joinpath(subj_id, sess_id).rglob('anat'))
+        anat_dirs = list(bids_defaced_outdir.joinpath(subj_id, sess_id).rglob('anat'))
     else:
-        anat_dirs = list(defacing_outdir.joinpath(subj_id).rglob('anat'))
+        anat_dirs = list(bids_defaced_outdir.joinpath(subj_id).rglob('anat'))
     # make workdir for each session within anat dir
     for anat_dir in anat_dirs:
         # iterate over all nii files within an anat dir to rename all primary and "other" scans
@@ -86,21 +105,24 @@ def reorganize_into_bids(input_bids_dir, subj_dir, sess_dir, primary_t1, defacin
                 compress_to_gz(nii_filepath, gz_file)
 
                 # copy over corresponding json sidecar
-                copy_over_sidecar(Path(primary_t1), input_bids_dir / anat_dir.relative_to(defacing_outdir), anat_dir)
+                copy_over_sidecar(Path(primary_t1), input_bids_dir / anat_dir.relative_to(bids_defaced_outdir),
+                                  anat_dir)
 
             elif nii_filepath.name.endswith('_defaced.nii.gz'):
                 new_filename = '_'.join(nii_filepath.name.split('_')[:-1]) + '.nii.gz'
                 shutil.copy2(nii_filepath, str(anat_dir / new_filename))
 
-                copy_over_sidecar(nii_filepath, input_bids_dir / anat_dir.relative_to(defacing_outdir), anat_dir)
+                copy_over_sidecar(nii_filepath, input_bids_dir / anat_dir.relative_to(bids_defaced_outdir), anat_dir)
 
         # move QC images and afni intermediate files to a new directory
-        intermediate_files_dir = anat_dir / 'workdir'
+        intermediate_files_dir = anat_dir / 'work_dir'
         intermediate_files_dir.mkdir(parents=True, exist_ok=True)
         for dirpath in anat_dir.glob('*'):
             if dirpath.name.startswith('workdir') or dirpath.name.endswith('QC'):
                 shutil.move(dirpath, intermediate_files_dir)
 
+        vqcdeface_prep(input_bids_dir, anat_dir, bids_defaced_outdir)
+
         if not no_clean:
             shutil.rmtree(intermediate_files_dir)
 
diff --git a/src/dsst_defacing_wf.py b/src/dsst_defacing_wf.py
@@ -1,6 +1,5 @@
 import argparse
 import json
-import re
 import subprocess
 from pathlib import Path
 
@@ -56,35 +55,6 @@ def get_sess_dirs(subj_dir_path, mapping_dict):
     return sess_dirs
 
 
-def create_defacing_id_list(qc_dir):
-    rel_paths_to_orig = [re.sub('/orig.nii.gz', '', str(o.relative_to(qc_dir))) for o in qc_dir.rglob('orig.nii.gz')]
-    with open(qc_dir / 'defacing_id_list.txt', 'w') as f:
-        f.write('\n'.join(rel_paths_to_orig))
-
-
-def vqcdeface_prep(input_dir, defacing_output_dir):
-    defacing_qc_dir = defacing_output_dir.parent / 'QC_prep' / 'defacing_QC'
-    interested_files = [f for f in defacing_output_dir.rglob('*.nii.gz') if
-                        'workdir' not in str(f).split('/')]
-    for defaced_img in interested_files:
-        entities = defaced_img.name.split('.')[0].split('_')
-        vqcd_subj_dir = defacing_qc_dir / f"{'/'.join(entities)}"
-        vqcd_subj_dir.mkdir(parents=True, exist_ok=True)
-
-        defaced_link = vqcd_subj_dir / 'defaced.nii.gz'
-        if not defaced_link.exists():
-            defaced_link.symlink_to(defaced_img)
-        img = list(input_dir.rglob(defaced_img.name))[0]
-        img_link = vqcd_subj_dir / 'orig.nii.gz'
-        if not img_link.exists(): img_link.symlink_to(img)
-
-    create_defacing_id_list(defacing_qc_dir)
-
-    vqcdeface_cmd = f"vqcdeface -u {defacing_qc_dir} -i {defacing_qc_dir / 'defacing_id_list.txt'} -m orig.nii.gz -d defaced.nii.gz -r defaced_render"
-
-    return vqcdeface_cmd
-
-
 def main():
     # get command line arguments
     input_dir, output, subj_id, sess_id, no_clean = get_args()
@@ -93,8 +63,8 @@ def main():
     mapping_dict = generate_mappings.crawl(input_dir, output)
 
     # create a separate bids tree with only defaced scans
-    defacing_outputs = output / 'bids_defaced'
-    defacing_outputs.mkdir(parents=True, exist_ok=True)
+    bids_defaced_outdir = output / 'bids_defaced'
+    bids_defaced_outdir.mkdir(parents=True, exist_ok=True)
 
     afni_refacer_failures = []  # list to capture afni_refacer_run failures
 
@@ -114,24 +84,13 @@ def main():
     # calling deface.py script
     for subj_sess in subj_sess_list:
         missing_refacer_out = deface.deface_primary_scan(input_dir, subj_sess[0], subj_sess[1], mapping_dict,
-                                                         defacing_outputs, no_clean)
+                                                         bids_defaced_outdir, no_clean)
         if missing_refacer_out is not None:
             afni_refacer_failures.extend(missing_refacer_out)
 
     with open(output / 'logs' / 'failed_afni_refacer_output.txt', 'w') as f:
         f.write('\n'.join(afni_refacer_failures))  # TODO Not very useful when running the pipeline in parallel
 
-    # unload fsl module and use fsleyes installed on conda env
-    # os.environ['TMP_DISPLAY'] =
-
-    # prep for visual inspection using visualqc deface
-    print(f"Preparing for QC by visual inspection...\n")
-
-    vqcdeface_cmd = vqcdeface_prep(input_dir, defacing_outputs)
-    print(f"Run the following command to start a VisualQC Deface session:\n\t{vqcdeface_cmd}\n")
-    with open(output / 'QC_prep' / 'defacing_qc_cmd', 'w') as f:
-        f.write(vqcdeface_cmd + '\n')
-
 
 if __name__ == "__main__":
     main()
diff --git a/src/generate_mappings.py b/src/generate_mappings.py
@@ -76,7 +76,7 @@ def primary_scans_qc_prep(mapping_dict, qc_prep):
 
         id_list.append(dest)
         primary_link = dest / 'primary.nii.gz'
-        if not primary_link.exists():
+        if not primary_link.is_symlink():
             try:
                 primary_link.symlink_to(primary)
             except:
diff --git a/src/generate_renders.py b/src/generate_renders.py