This document outlines the pipeline for generating normalized counts using the following tools:
FastQC → Trim → FastQC → STAR → RSEM & QualimapQC
- Qualimap uses
Aligned.sortedByCoord.out.bamgenerated by STAR. - RSEM uses
Aligned.toTranscriptome.out.bamgenerated by STAR.
To run the pipeline, the following files are needed:
- Reference File: like
MtbNCBIH37Rv.fa - Adapter File: like
adapters.fa - GTF File: like
MtbNCBIH37Rv_ncRNAs_sORFs.gtf - Data Files: Raw sequencing reads like
-
3151_19_S13_R1_001.fastq.gz -
3151_19_S13_R2_001.fastq.gz -
Preprocessing Step: Rename raw FASTQ files to follow
*_R1_001.fastq.gzand*_R2_001.fastq.gz.Steps:
-
Navigate to the appropriate data folder. For example:
cd /staging/groups/pepperell_group/Mtb_RNAseq/HTSeqCounts/ # or cd /staging/groups/pepperell_group/Mtb_RNAseq/RSEM/
-
Run the renaming script:
./format_fastqc_name.sh
-
-
- Input File:
input.txt(contains sample identifiers, one per line)3151_17_S11 3151_18_S12 3151_19_S13
The pipeline uses HTcondor DAG files to manage the workflow. These files are automatically generated and include:
- Top-Level DAG File:
input_TPM_topLevel.dag- Runs individual DAG files for each sample.
- Example DAG files for individual samples:
3151_17_S11_TPM.dag3151_18_S12_TPM.dag3151_19_S13_TPM.dag
- Template and Script for DAG Generation:
TPM_dag.template: Template DAG file with placeholders ($(RUN),$(REF),$(annot_gtf)) to be replaced.make_TPM_dag.py: Script to generate individual DAG files by replacing placeholders with actual values.
- Generating DAG Files:
- To generate the individual DAG files and top-level DAG file from the template, run the following command:
python3 make_TPM_dag.py input.txt TPM_dag.template MtbNCBIH37Rv.fa MtbNCBIH37Rv_ncRNAs_sORFs.gtf
To submit the DAG job described in input_TPM_topLevel.dag on CHTC, use the following command:
condor_submit_dag input_TPM_topLevel.dagTo check the status of a DAG job on HTCondor, use the following command:
condor_q -nobatchTo lively check and watch the status of a DAG job on HTCondor instead of repeatedly querying, use the following command:
condor_watch_qTo create and build Docker images for STAR and RSEM, follow these steps. For more details, refer to the CHTC Docker Build Guide. Replace <username>, <imagename>, and <tag> with your DockerHub username, image name, and desired tag, respectively.
-
Create the Dockerfiles
Create separate Dockerfiles for RSEM and STAR:RSEM.DockerfileSTAR.Dockerfile
-
Build the Docker Images
Usedocker buildxto build the images with the appropriate platform.docker buildx build . -f RSEM.Dockerfile -t <username>/<imagename> --platform linux/x86_64 # Example: docker buildx build . -f RSEM.Dockerfile -t marissazhang/rsem --platform linux/x86_64 docker buildx build . -f STAR.Dockerfile -t <username>/<imagename> --platform linux/x86_64 # Example: docker buildx build . -f STAR.Dockerfile -t marissazhang/star --platform linux/x86_64
-
Push the Docker Images
docker push <username>/<imagename>:<tag> # Example: docker push marissazhang/rsem:latest docker push marissazhang/star:latest
https://link.springer.com/protocol/10.1007/978-1-4939-4035-6_14

