Tutorial

This tutorial assumes you have an Amazon Web Services account registered with the NMDP. This will grant you access to a public machine image with all the data, tools, and compute infrastructure you need to proceed. If you do not have these things, go here first.

Get the code

git clone [email protected]:/parallel_genomic.git

Will create a local clone (working copy) of the GitHub repository, which contains several shell scripts for parallel execution of pipeline components.

View the sample data

Public sample data from the sequence read archive are provided here:

/mnt/common/data/incoming/nmdp/Proposed_Hackathon_Dataset/DRP000941/

Each file (73 total) contains phased NGS data for 6-locus HLA published by Hosomichi et al, 2013. The files must be decompressed from SRA format to FASTQ before processing. SRA provides tools for this purpose. The decompressed data are also provided in the fastq/ directory.

Run the pipeline

From within DRP000941/:

bash /mnt/scratch/janderson/parallel_genomic/splitter.bash/splitter.bash fastq/

Interpret and validate the results

Clinical interpretation of HLA DNA sequence for transplantation is typically confined to the antigen recognition sites (ARS), which correspond to exons 2 and 3 or exon 2 of class I and class II HLA genes, respectively. The NMDP's interpretation service currently requires consensus sequences that are trimmed of sequence representing other structural elements (non-ARS exons, introns, promoters and other untranslated regions). We will use HLA-A results for a single homozygous sample DRR003809 (DKB) for illustrative purposes. The first step is to filter contigs by region, in this case exons 2 and 3 of HLA-A:

groovy -classpath /mnt/scratch/caleb/ngs-feature-1.0-SNAPSHOT/lib/ngs-feature-1.2-SNAPSHOT.jar:/mnt/scratch/caleb/ngs-tools-1.0-SNAPSHOT/lib/biojava.jar:/mnt/scratch/caleb/ngs-tools-1.0-SNAPSHOT/lib/picard-1.102.0.jar:/mnt/scratch/caleb/ngs-tools-1.0-SNAPSHOT/lib/guava-17.0.jar:/mnt/scratch/caleb/ngs-variant-1.0-SNAPSHOT/lib/ngs-variant-1.2-SNAPSHOT.jar /mnt/scratch/caleb/splice-bam.groovy -i final/DRR003809_1.fastq.contigs.bwa.sorted.bam -x ~/regions/clinical-exons/hla-a.txt -g HLA-A -m -b 0.5

Finally, use the NMDP's interpretation service to assign nomenclature either. You can upload your processed contigs with a simple web interface: or on the command line:

Create an HML message

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tutorial

Get the code

View the sample data

Run the pipeline

Interpret and validate the results

Create an HML message

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally