A suite of production-ready bioinformatics pipelines for HPC SLURM environments. Each pipeline auto-detects samples from directory structure, supports SE/PE sequencing, uses SLURM job arrays for parallelism, and saves all intermediate outputs and figures.
| Pipeline | Directory | Description |
|---|---|---|
| Bulk RNA-seq | pipelines/bulk_rnaseq/ |
Differential expression analysis with DESeq2, edgeR, limma-voom |
| scRNA-seq | pipelines/scrna_seq/ |
Single-cell analysis with Seurat, Scanpy, Cell Ranger |
| ChIP-seq | pipelines/chipseq/ |
Peak calling and differential binding with MACS2, DiffBind |
| ATAC-seq | pipelines/atacseq/ |
Chromatin accessibility with MACS2, footprinting, DiffBind |
| WGS | pipelines/wgs/ |
Whole genome variant calling with GATK, DeepVariant, Mutect2 |
| WES | pipelines/wes/ |
Whole exome variant calling with interval restriction + exome QC |
bioinformatics-pipelines/
├── lib/ # Shared utilities (sourced by all pipelines)
│ ├── utils.sh # Logging, sample detection, FASTQ detection
│ ├── genome_refs.sh # Human GRCh38 + Mouse GRCm39 reference paths
│ ├── slurm_utils.sh # Job array submission, dependency chaining
│ ├── parse_config.R # R: reads config.sh into named list
│ └── parse_config.py # Python: reads config.sh into dict
├── templates/
│ ├── slurm_header.sh # SBATCH directive template
│ ├── config_template.sh # Annotated config template
│ └── sample_sheet_template.tsv # Sample sheet template
└── pipelines/
├── bulk_rnaseq/
├── scrna_seq/
├── chipseq/
├── atacseq/
├── wgs/
└── wes/
Each pipeline follows the structure:
pipelines/{name}/
├── config.sh # Pipeline configuration
├── sample_sheet.tsv # Sample metadata
├── scripts/ # Numbered analysis scripts + run_all.sh
└── docs/README.md # Pipeline documentation
- Configure: Edit
pipelines/{name}/config.sh— set paths, genome, modules, SLURM account - Prepare inputs: Place FASTQs in
fastq/{sample_id}/directories; fill outsample_sheet.tsv - Run:
bash pipelines/{name}/scripts/run_all.sh
- Per-sample steps (QC, trim, align, dedup) run as job arrays (
--array=1-N) - Aggregate steps (MultiQC, DE, visualization) run as single jobs with
--dependency=afterok:{JOB_ID} run_all.shorchestrates the full dependency chain automatically- Individual steps:
sbatch scripts/02_trim.sh sample_list.txt - Single sample debug:
SLURM_ARRAY_TASK_ID=3 bash scripts/02_trim.sh sample_list.txt
FASTQs must be organized by sample:
fastq/
├── sample_A/
│ ├── sample_A_R1.fastq.gz
│ └── sample_A_R2.fastq.gz # (PE only)
├── sample_B/
│ ├── sample_B_R1.fastq.gz
│ └── sample_B_R2.fastq.gz
Naming conventions supported: _R1/_R2 or _1/_2 suffixes.
- Human: GRCh38 (hg38) — all indexes, annotations, known sites
- Mouse: GRCm39 (mm39) — all indexes, annotations, known sites
- SLURM cluster with
modulesystem - Standard bioinformatics HPC modules (FastQC, Trimmomatic, STAR, BWA, GATK, etc.)
- R 4.x with Bioconductor packages
- Python 3.x with scanpy, scvelo, etc.
See individual pipeline docs/README.md for specific requirements.