Skip to content

banilmohammed/xQTL-Simulation-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xQTL Simulation Pipeline

Cohort VCF Pipeline

The steps below describe the setup and usage of the Cohort VCF generation pipeline.

Setup

To get started, you will need a couple things:

  1. bwa-mem2

  2. gatk

    a. The easiest option would be to pull the docker image. This also includes other tools such as samtools. If you are on Hoffman2, use apptainer. apptainer on hoffman

    b. Install manually with instructions from here. You will also need to install samtools.

You can then clone this repo, and you should be good to go!

Running

There is no GPU needed for this pipeline. If you are on Hoffman2, I used these resources:

qrsh -pe shared 15 -l h_rt=8:00:00,h_data=2G
  1. Shell into the docker/ apptainer container or your local directory containing all code and tools.
  2. Verify that the ref/ directory contains the yeast reference fasta and bed files.
  3. Edit the path to the bwa-mem2 executables in the Makefile.
  4. Run the following command:
make --dry-run <path to paired end reads>.gvcf.idx
ex. make --dry-run test_data/.sub_AVT.gvcf.idx

The above command will list out the commands that make will run. Verify that the paths are correct. The input path to the paired end reads should not include the file ending (.fasta.gz).

Once you have verified that the paths are correct you can run:

make <path to paired end reads>.gvcf.idx
ex. make test_data/.sub_AVT.gvcf.idx

This will do the following:

1. Generate the bwa index of the reference fasta.
2. Align input paired end fastas to the reference.
3. Add readgroups to the bam file.
4. Index bam file.
5. Index reference fasta.
6. Generate reference dict file for `GenomeDB`.
7. Generate gVCF file.
8. Index gVCF file.
  1. Then to generate the genomedb you can run:
make output/.genomedb

This just generates a database that makes the joint genotyping in the next step faster.

  1. Finally to perform joint genotyping you can run:
make output/sac.vcf

This will generate and index the final cohort vcf file from all the gVCFs found in the directory of the input paired end reads.

I have included some downsampled paired end sample reads to quickly test this pipeline under test_data.

More details on this gatk pipeline can be found here and here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published