griffithlab
diff --git a/‎03-running_pvactools.Rmd‎
Lines changed: 130 additions & 13 deletions b/‎03-running_pvactools.Rmd‎
Lines changed: 130 additions & 13 deletions
diff --git a/‎04-outputs.Rmd‎
Lines changed: 13 additions & 0 deletions b/‎04-outputs.Rmd‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎04-pvacview_tour.Rmd‎ renamed to ‎05-pvacview_tour.Rmd‎ b/‎04-pvacview_tour.Rmd‎ renamed to ‎05-pvacview_tour.Rmd‎
diff --git a/‎05-conclusions.Rmd‎ renamed to ‎06-conclusions.Rmd‎ b/‎05-conclusions.Rmd‎ renamed to ‎06-conclusions.Rmd‎
diff --git a/‎_bookdown.yml‎
Lines changed: 3 additions & 2 deletions b/‎_bookdown.yml‎
Lines changed: 3 additions & 2 deletions
@@ -12,7 +12,6 @@ This chapter will cover:
 - Starting an interactive Docker session
 - Running pVACseq
 - Running pVACfuse
-- Understanding pVACtools outputs
 
 ## Starting Docker
 
@@ -40,10 +39,16 @@ to it once you exit the Docker image.
 
 ## Running pVACseq
 
-The pVACseq pipeline is run using the `pvacseq run` command.
+pVACseq is used to identify neoantigens from missense, inframe indel, and
+frameshift mutations. The pipeline uses a somatic VCF file as an input, which
+represents variants called in the tumor sample. The VEP annoations in the VCF file
+inform the variant type of a variant and their consequence on the gene transcripts
+overlapping the genomic coodinates of the variant. The amino acid change of
+the predicted consequence if used by pVACseq to calculate the mutated peptide sequence.
 
+The pVACseq pipeline is run using the `pvacseq run` command.
 
-### Required Parameters
+### Required Parameters for pVACseq
 
 The `pvacseq run` command takes a number of required parameters in the
 following order:
@@ -65,7 +70,7 @@ following order:
   run all available prediction algorithms.
 - `output_dir`: The directory for writing all result files.
 
-### Optional Parameters
+### Optional Parameters for pVACseq
 
 The `pvacseq run` command offers quite a few optional arguments to fine-tune
 your run. Here are a list of parameters we generally recommend:
@@ -122,7 +127,7 @@ on your specific analysis needs:
 
 - `--class-i-epitope-length` and `--class-ii-epitope-length`: By default 8,
   9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters,
-  respecitively but different lengths might be desired.
+  respectively, but different lengths might be desired.
 - `--tumor-purity`: This parameter is used to bin variants into clonal and
   sub-clonal. This parameter might need to be adjusted based on the tumor
   purity of your data.
@@ -140,15 +145,18 @@ on your specific analysis needs:
   expensive. This parameter limits how many amino acids of the downstream
   sequence are included in the prediction.
 
+There are additional parameters in pVACseq that we won't discuss at this point
+because the defaults are usually sufficient. To see all available parameters, you can
+run `pvacseq run -h`.
+
 ### pVACseq Command
 
 Given the considerations outlined above, let's run pVACseq on our sample data.
 
-From the
-`optitype_normal_result.tsv` we know that the patient's class I alleles are HLA-A\*29:02, HLA-B\*45:01,
-HLA-B\*82:02, and HLA-C\*06:02. We also have clinical typing information that confirms
-these class I alleles as well as identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the
-patient's class II alleles.
+From the `optitype_normal_result.tsv` we know that the patient's class I alleles are
+HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02. We also have clinical typing 
+information that confirms these class I alleles as well as identified DQA1\*03:03,
+DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles.
 
 To identify the tumor and normal sample names we will grep the VCF file for
 the CHROM header:
@@ -161,7 +169,7 @@ This shows that the tumor sample is named `HCC1395_TUMOR_DNA` and the normal sam
 
 For our test run, please execute the `pvacseq run` command below. The
 prediction run might take a while but pVACseq will output progress messages as
-it processeses through the pipeline.
+it runs through the pipeline.
 
 ```{r, engine = 'bash', eval = FALSE}
 pvacseq run \
@@ -187,8 +195,117 @@ all \
 
 ## Running pVACfuse
 
-## Understanding pVACtools outputs
+pVACfuse is run to in order to predict neoantigens from fusion events. The
+pipeline uses annotated fusion calls from eithe AGFusion or Arriba for this
+purpose. These annotators already include the fusion peptide sequence in their
+outputs which pVACfuse uses to extract neoantigens around the fusion position.
 
-This section will review pVACtools outputs and explain how to correctly interpret them. 
+The pVACfuse pipeline is run using the `pvacfuse run` command.
 
+### Required Parameters for pVACfuse
+
+The `pvacfuse run` command takes a number of required parameters in the
+following order:
 
+- `input_file`: An AGFusion output directory or Arriba fusion.tsv output file.
+  For the purpose of this course, we will be running pVACfuse with AGFusion
+  output.
+- `sample_name`: The name of the tumor sample being processed.
+- `allele(s)`: The name of the HLA allele to use for epitope prediction. Multiple
+  alleles can be specified using a comma-separated list. These should be the
+  HLA alleles of your patient. You might have clinical typing information for
+  your patient. If not, you will need to computational predict the patient's
+  HLA type using software such as OptiType.
+- `prediction_algorithms`: The epitope prediction algorithms to use. Multiple
+  prediction algorithms can be specified, separated by spaces. Use `all` to
+  run all available prediction algorithms.
+- `output_dir`: The directory for writing all result files.
+
+### Optional Parameters for pVACfuse
+
+In addition to the required parameters, the `pvacseq run` command also offers
+optional arguments to fine-tune your run. You will find a lot of overlap
+between pVACfuse and pVACseq parameters and the same general considerations
+usually apply. Here are a list of parameters we generally recommend:
+
+- `--starfusion-file`: Path to a `star-fusion.fusion_predictions.tsv` or
+  `star-fusion.fusion_predictions.abridged.tsv`. This file is used to extract
+  read support and expression information.
+- `--iedb-install-directory`: For speed and reliability, we generally recommend
+  that users use a standalone installation of the IEDB software. The pVACtools
+  Docker containers already come with this software pre-installed in the
+  `/opt/iedb` directory.
+- `--allele-specific-binding-thresholds`: When filtering and tiering
+  neoantigen candidates, one main criteria is the predicted peptide-MHC
+  binding affinity. By default, pVACfuse uses a cutoff of <500 nmol IC50.
+  However, for some HLA alleles, other cutoffs are more appropriate depending
+  on the distribution of binding affinities across peptides. Setting
+  this flag enables allele-specific binding cutoffs as recommended by
+  [IEDB](https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions).
+- `--run-reference-proteome-similarity`: One consideration when selecting
+  neoantigen candidates, is that the neoantigen should not occur natively in
+  the patient's proteome. When this flag is set, pVACfuse will search for each
+  neoantigen candidate in the reference proteome and report any hits found.
+  By default this is done using BLASTp but we recommend using a proteome FASTA
+  file via the `--peptide-fasta` parameter to speed up this step.
+- `--percentile-threshold`: When considering the peptide-MHC binding affinity
+  for filtering and prioritizing neoantigen candidates, by default only the
+  IC50 value is being used. Setting this parameter will additional also filter
+  on the predicted percentile. We recommend a value of 0.01 (1%) for this
+  threshold.
+
+Additionally there are a number of parameters that might be useful depending
+on your specific analysis needs:
+
+- `--class-i-epitope-length` and `--class-ii-epitope-length`: By default 8,
+  9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters,
+  respectively, but different lengths might be desired.
+- `--problematic-amino-acids`: Some vaccine manufacturers will consider certain amino
+  acids in the neoantigen candidates difficult to manufacture. For example, a
+  Cysteine is commonly considered problematic as it makes the peptide
+  unstable. This parameter allows users to set their own rules as to which
+  peptides are considered problematic and peptides meeting those rules will be marked in the
+  pVACseq results and deprioritized.
+- `--threads`: This argument will allow pVACfuse to run in multi-processing
+  mode.
+- `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACfuse.
+- `--downstream-sequence-length`: For frameshift fusions, the downstream
+  sequence can potentially be very long, which can be computationally
+  expensive. This parameter limits how many amino acids of the downstream
+  sequence are included in the prediction.
+
+### pVACfuse Command
+
+Given the considerations outlined above, let's run pVACfuse on our sample data.
+
+As with pVACseq, we can use the `optitype_normal_result.tsv` file to identify the patient's
+class I HLA alleles. These are HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02.
+We also have clinical typing information that confirms these class I alleles as well as 
+identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles.
+
+For pVACfuse the sample name is not used for any parsing so it doesn't need to
+match any specific information in the AGFusion results. It is only used for
+naming result files. For consistency we will use the same `HCC1395_TUMOR_DNA`
+sample name we used in pVACfuse.
+
+For our test run, please execute the `pvacfuse run` command below. The
+prediction run might take a while but pVACfuse will output progress messages as
+it runs through the pipeline.
+
+```{r, engine = 'bash', eval = FALSE}
+pvacfuse run \
+/HCC1395_inputs/agfusion_results \
+HCC1395_TUMOR_DNA \
+HLA-A*29:02,HLA-B*45:01,HLA-B*82:02,HLA-C*06:02,DQA1*03:03,DQB1*03:02,DRB1*04:05 \
+all \
+/pVACtools_outputs/pvacfuse_predictions \
+--iedb-install-directory /opt/iedb \
+--allele-specific-binding-thresholds \
+--percentile-threshold 0.01 \
+--run-reference-proteome-similarity \
+--peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \
+--problematic-amino-acids C \
+--downstream-sequence-length 100 \
+--n-threads 8 \
+--keep-tmp-files
+```
@@ -0,0 +1,13 @@
+# Understanding pVACtools outputs
+
+```{r, include = FALSE}
+ottrpal::set_knitr_image_path()
+```
+
+## Learning Objectives
+
+This chapter will cover:
+
+- Understanding the output files produced by pVACtools
+- Interpreting the .filtered.tsv file
+- Interpreting the .aggregated.tsv file
@@ -5,8 +5,9 @@ rmd_files: ["index.Rmd",
             "01-intro.Rmd",
             "02-prerequisites.Rmd",
             "03-running_pvactools.Rmd",
-            "04-pvacview_tour.Rmd",
-            "05-conclusions.Rmd",
+            "04-outputs.Rmd",
+            "05-pvacview_tour.Rmd",
+            "06-conclusions.Rmd",
             "About.Rmd",
             "References.Rmd"]
 new_session: yes