This snakemake pipeline is designed for paired-end NGS DNA
- Path of fastqc files
- Name of outout folder
- Reference genome
multiqc_data\- Dictionary containing the summary results of all the tools, inculde multiqc.htmllogs\- Directory of log files for each job, check here first if you run into errorsworking\- Directory containing intermediate files for each job
- **QC--fastqc
- **Trimming--trim galore
- **QC--fastqc
- **Align--bwa
- **Sort--samtools
- **Deduplicate--picard
- **Summary--multiqc
-
Install conda
-
Clone workflow into working directory
git clone <repo> <dir> cd <dir>
-
Create a new enviroment
conda env create -n <project_name> --file environment.yaml
-
Activate the environment
conda activate <project_name>
-
Enable the Bioconda channel
conda config --add channels bioconda conda config --add channels conda-forge -
Install snakemake
conda install snakemake
-
Edit configuration files
change the path of fastq_dir, output_dir, reference_genome in "config.yaml"
-
Execute the workflow.
- The first time you are executing this snakemake pipeline it should run locally, once the first run is over (you can use --dry), you can switch to running it on the cluster.
snakemake --configfile "config.yaml" --use-conda --cores N
-
submit the jobs to SGE cluster to run the pipeline
download snakemake-executor-plugin-cluster-generic by pip
pip install snakemake-executor-plugin-cluster-generic
then
snakemake --use-conda --jobs {cores} --executor cluster-generic --cluster-generic-submit-cmd "qsub -cwd -V -l h_vmem=50G -pe parallel {threads} -o logs/ -e logs/"