Skip to content

pepperell-lab/CHTC_RNAseq_RSEM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pepperell_Lab_RNAseq_STAR_RSEM_Pipeline

This document outlines the pipeline for generating normalized counts using the following tools:
FastQC → Trim → FastQC → STAR → RSEM & QualimapQC

Key Notes

  • Qualimap uses Aligned.sortedByCoord.out.bam generated by STAR.
  • RSEM uses Aligned.toTranscriptome.out.bam generated by STAR.

Required Files

To run the pipeline, the following files are needed:

  • Reference File: like MtbNCBIH37Rv.fa
  • Adapter File: like adapters.fa
  • GTF File: like MtbNCBIH37Rv_ncRNAs_sORFs.gtf
  • Data Files: Raw sequencing reads like
    • 3151_19_S13_R1_001.fastq.gz

    • 3151_19_S13_R2_001.fastq.gz

    • Preprocessing Step: Rename raw FASTQ files to follow *_R1_001.fastq.gz and *_R2_001.fastq.gz.

      Steps:

      1. Navigate to the appropriate data folder. For example:

        cd /staging/groups/pepperell_group/Mtb_RNAseq/HTSeqCounts/
        # or
        cd /staging/groups/pepperell_group/Mtb_RNAseq/RSEM/
      2. Run the renaming script:

        ./format_fastqc_name.sh
  • Input File: input.txt (contains sample identifiers, one per line)
    3151_17_S11
    3151_18_S12
    3151_19_S13
    

DAG Files

The pipeline uses HTcondor DAG files to manage the workflow. These files are automatically generated and include:

  • Top-Level DAG File: input_TPM_topLevel.dag
    • Runs individual DAG files for each sample.
    • Example DAG files for individual samples:
      • 3151_17_S11_TPM.dag
      • 3151_18_S12_TPM.dag
      • 3151_19_S13_TPM.dag
  • Template and Script for DAG Generation:
    • TPM_dag.template: Template DAG file with placeholders ($(RUN), $(REF), $(annot_gtf)) to be replaced.
    • make_TPM_dag.py: Script to generate individual DAG files by replacing placeholders with actual values.
  • Generating DAG Files:
    • To generate the individual DAG files and top-level DAG file from the template, run the following command:
    python3 make_TPM_dag.py input.txt TPM_dag.template MtbNCBIH37Rv.fa MtbNCBIH37Rv_ncRNAs_sORFs.gtf

Submitting and Watching a DAG Job

To submit the DAG job described in input_TPM_topLevel.dag on CHTC, use the following command:

condor_submit_dag input_TPM_topLevel.dag

To check the status of a DAG job on HTCondor, use the following command:

condor_q -nobatch

To lively check and watch the status of a DAG job on HTCondor instead of repeatedly querying, use the following command:

condor_watch_q

Documentation: how to Build STAR/RSEM Dockerfile(Only if Docker image)

To create and build Docker images for STAR and RSEM, follow these steps. For more details, refer to the CHTC Docker Build Guide. Replace <username>, <imagename>, and <tag> with your DockerHub username, image name, and desired tag, respectively.

Steps

  1. Create the Dockerfiles
    Create separate Dockerfiles for RSEM and STAR:

    • RSEM.Dockerfile
    • STAR.Dockerfile
  2. Build the Docker Images
    Use docker buildx to build the images with the appropriate platform.

    docker buildx build . -f RSEM.Dockerfile -t <username>/<imagename> --platform linux/x86_64
    # Example:
    docker buildx build . -f RSEM.Dockerfile -t marissazhang/rsem --platform linux/x86_64
    
    docker buildx build . -f STAR.Dockerfile -t <username>/<imagename> --platform linux/x86_64
    # Example:
    docker buildx build . -f STAR.Dockerfile -t marissazhang/star --platform linux/x86_64
  3. Push the Docker Images

    docker push <username>/<imagename>:<tag>
    # Example:
    docker push marissazhang/rsem:latest
    docker push marissazhang/star:latest

Pipeline Visualizations

RNAseq_Pipeline_RSEM Pipeline Made by Kadee

Reference

https://link.springer.com/protocol/10.1007/978-1-4939-4035-6_14

About

RNAseq pipeline that uses STAR and RSEM to produce normalized counts. Condor scripts written to be used on CHTC servers with Docker Containers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors