xQTL Simulation Pipeline

Cohort VCF Pipeline

The steps below describe the setup and usage of the Cohort VCF generation pipeline.

Setup

To get started, you will need a couple things:

bwa-mem2
gatk

a. The easiest option would be to pull the docker image. This also includes other tools such as samtools. If you are on Hoffman2, use apptainer. apptainer on hoffman

b. Install manually with instructions from here. You will also need to install samtools.

You can then clone this repo, and you should be good to go!

Running

There is no GPU needed for this pipeline. If you are on Hoffman2, I used these resources:

qrsh -pe shared 15 -l h_rt=8:00:00,h_data=2G

Shell into the docker/ apptainer container or your local directory containing all code and tools.
Verify that the ref/ directory contains the yeast reference fasta and bed files.
Edit the path to the bwa-mem2 executables in the Makefile.
Run the following command:

make --dry-run <path to paired end reads>.gvcf.idx
ex. make --dry-run test_data/.sub_AVT.gvcf.idx

The above command will list out the commands that make will run. Verify that the paths are correct. The input path to the paired end reads should not include the file ending (.fasta.gz).

Once you have verified that the paths are correct you can run:

make <path to paired end reads>.gvcf.idx
ex. make test_data/.sub_AVT.gvcf.idx

This will do the following:

1. Generate the bwa index of the reference fasta.
2. Align input paired end fastas to the reference.
3. Add readgroups to the bam file.
4. Index bam file.
5. Index reference fasta.
6. Generate reference dict file for `GenomeDB`.
7. Generate gVCF file.
8. Index gVCF file.

Then to generate the genomedb you can run:

make output/.genomedb

This just generates a database that makes the joint genotyping in the next step faster.

Finally to perform joint genotyping you can run:

make output/sac.vcf

This will generate and index the final cohort vcf file from all the gVCFs found in the directory of the input paired end reads.

I have included some downsampled paired end sample reads to quickly test this pipeline under test_data.

More details on this gatk pipeline can be found here and here.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
output		output
ref		ref
test_data		test_data
Makefile		Makefile
README.md		README.md
gen_cohort_vcf.sh		gen_cohort_vcf.sh
gen_genomedb.sh		gen_genomedb.sh
gen_gvcfs.sh		gen_gvcfs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

xQTL Simulation Pipeline

Cohort VCF Pipeline

Setup

Running

About

Uh oh!

Releases

Packages

Languages

banilmohammed/xQTL-Simulation-Pipeline

Folders and files

Latest commit

History

Repository files navigation

xQTL Simulation Pipeline

Cohort VCF Pipeline

Setup

Running

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages