Skip to content

Commit 00e0bb0

Browse files
committed
various minor updates
1 parent 57e07ea commit 00e0bb0

File tree

4 files changed

+55
-34
lines changed

4 files changed

+55
-34
lines changed

01-intro.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ottrpal::set_knitr_image_path()
55

66
# Introduction
77

8-
This course is currently under development. The topics to be covered are outlined below.
8+
This course has been developed recently (Summer 2023). We welcome any feedback at [email protected] or by submission of [GitHub issues](https://github.com/griffithlab/pVACtools_Intro_Course/issues).
99

1010
## Motivation
1111

@@ -15,8 +15,8 @@ framework called pVACtools that, when paired with a well-established genomics pi
1515
pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions,
1616
and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework
1717
designed to facilitate the incorporation of additional algorithms. Prioritization of predicted peptides occurs by integrating diverse data, including mutant
18-
allele expression, peptide binding affinities, and determination whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows
19-
clinical users to efficiently generate, review, and interpret results, selecting candidate peptides for individual patient vaccine designs. Additional modules
18+
allele expression, peptide binding affinities, and determination of whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows
19+
users to efficiently generate, review, and interpret results, selecting candidate peptides for individual experiments or patient vaccine designs. Additional modules
2020
support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector
2121
vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All
2222
of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq, pVACfuse, and pVACbind),
@@ -55,7 +55,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCG
5555

5656
## Target Audience
5757

58-
The course is intended for anyone seeking a better understanding of current best practices in cancer vaccine design and neoantigen prioritization using pVACtools.
58+
The course is intended for anyone seeking a better understanding of current best practices in neoantigen identification and prioritization using pVACtools.
5959
It assumes that the learner is familiar with basic biology, genetics and immunology concepts.
6060

6161
## Curriculum

02-prerequisites.Rmd

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ pVACfuse.
2020
Docker is a tool that is used to automate the deployment of applications
2121
in lightweight containers so that applications can work efficiently in
2222
different environments in isolation. We provide versioned Docker containers
23-
for all pVACtools releases.
23+
for all pVACtools [releases](https://github.com/griffithlab/pVACtools/releases)
24+
via [dockerhub griffithlab/pvactools](https://hub.docker.com/r/griffithlab/pvactools).
2425

2526
In order to use Docker, you will to download the [Docker Desktop software](https://www.docker.com/get-started/).
2627
Please ensure you select the correct install package for your operating
@@ -66,19 +67,30 @@ install.packages("shinycssloaders", dependencies=TRUE)
6667

6768
## Data
6869

69-
For this course, we have put together a set of input data for the HCC1395
70-
cell line. Data from this cell line is commonly used as test data in bioinformatics
71-
applications. The input data consists of the following files:
70+
For this course, we have put together a set of input data generated from the breast
71+
cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
72+
Data from this cell line is commonly used as test data in bioinformatics applications.
73+
For more information on these lines and the generation of test data, please refer to
74+
the data section of our precision medicine bioinformatics course:
75+
[here](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
76+
77+
The input data consists of the following files:
7278

7379
For pVACseq:
7480

7581
- `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
76-
annotated with VEP and has coverage and expression information added and.
82+
annotated with VEP and has coverage and expression information added. It has also been annotated with
83+
custom VEP plugins that provide wild type and mutant version of the full length protein sequences
84+
predicted to arise from each transcript annotated with each variant.
7785
- `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
7886
in-phase proximal variants that might alter the predicted peptide sequence around a somatic
7987
mutation of interest
8088
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions
8189

90+
For more detailed information on how the variant input file is created, please refer to the
91+
[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html)
92+
section of the pVACtools docs
93+
8294
For pVACfuse:
8395

8496
- `agfusion_results`: A AGFusion output directory with annotated fusion calls
@@ -99,4 +111,5 @@ unzip HCC1395_inputs.zip
99111

100112
This course will not cover the required pre-processing steps for the pVACtools
101113
input data but extensive instructions on how to prepare your own data for use
102-
with pVACtools can be found at [pvactools.org](http://www.pvactools.org)
114+
with pVACtools can be found at [pvactools.org](http://www.pvactools.org).
115+

03-running_pvactools.Rmd

Lines changed: 31 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ docker run \
2828
```
2929

3030
This will pull the 4.0.0 version of the griffithlab/pvactools Docker image and
31-
start an interactive session of that Docker image. The `-v ${PWD}/HCC1395_inputs:/HCC1395_inputs`
31+
start an interactive session (`-it`) of that Docker image using the bash shell (`/bin/bash`).
32+
The `-v ${PWD}/HCC1395_inputs:/HCC1395_inputs`
3233
part of the command will mount the
3334
`HCC1395_inputs` folder at `/HCC1395_inputs` inside of the Docker container
3435
so that you will have access to the input data from inside the Docker
@@ -41,10 +42,10 @@ to it once you exit the Docker image.
4142

4243
pVACseq is used to identify neoantigens from missense, inframe indel, and
4344
frameshift mutations. The pipeline uses a somatic VCF file as an input, which
44-
represents variants called in the tumor sample. The VEP annotations in the VCF file
45-
inform the variant type of a variant and their consequence on the gene transcripts
46-
overlapping the genomic coordinates of the variant. The amino acid change of
47-
the predicted consequence if used by pVACseq to calculate the mutated peptide sequence.
45+
represents variants identified in the tumor sample. The VEP annotations in the VCF file
46+
provide the variant type of a variant and their consequence on individual gene transcripts
47+
overlapping the genomic coordinates of the variant. The predicted amino acid change of
48+
the variant for a particular transcript is used by pVACseq to calculate the mutated peptide sequence.
4849

4950
The pVACseq pipeline is run using the `pvacseq run` command.
5051

@@ -58,16 +59,18 @@ following order:
5859
information.
5960
- `sample_name`: The name of the tumor sample being processed. When processing
6061
a multi-sample VCF the sample name must be a sample ID in the input VCF #CHROM
61-
header line. Only variants that are called (genotype/GT 0/1 or 1/1) in that
62-
sample will be processed.
63-
- `allele(s)`: The name of the HLA allele to use for epitope prediction. Multiple
62+
header line. Only variants that are called (with a genotype/GT of 0/1 or 1/1)
63+
in that sample will be processed.
64+
- `allele(s)`: The name of the HLA allele(s) to use for epitope prediction. Multiple
6465
alleles can be specified using a comma-separated list. These should be the
65-
HLA alleles of your patient. You might have clinical typing information for
66-
your patient. If not, you will need to computational predict the patient's
67-
HLA type using software such as OptiType.
66+
HLA alleles of your patient/sample. You might have clinical typing information for
67+
your patient. If not, you will need to computationally predict the patient's
68+
HLA type using software such as OptiType. The the HLA allele names should
69+
be in the following format: `HLA-A*02:01`.
6870
- `prediction_algorithms`: The epitope prediction algorithms to use. Multiple
6971
prediction algorithms can be specified, separated by spaces. Use `all` to
70-
run all available prediction algorithms.
72+
run all available prediction algorithms. pVACseq will automatically determine
73+
which algorithms are valid for each HLA allele.
7174
- `output_dir`: The directory for writing all result files.
7275

7376
### Optional Parameters for pVACseq
@@ -102,23 +105,24 @@ your run. Here are a list of parameters we generally recommend:
102105
subset of peptide positions are presented to the T cell receptor
103106
for recognition, while others are responsible for anchoring to the MHC, making
104107
these positional considerations critical for predicting T cell responses.
105-
Conventionally, the 1st, 2nd, n-1 and n position in a neoantigen candidates
108+
Conventionally, the 1st, 2nd, n-1 and n position in a neoantigen candidate
106109
were considered anchors while recent studies [@Xia2023] have shown that
107110
these positions will depend on the HLA allele. Setting this flag will use
108-
allele-specific anchor locations.
111+
allele-specific anchor locations where possible (we have predictions for ~300 common alleles).
109112
- `--run-reference-proteome-similarity`: One consideration when selecting
110113
neoantigen candidates, is that the neoantigen should not occur natively in
111114
the patient's proteome. When this flag is set, pVACseq will search for each
112115
neoantigen candidate in the reference proteome and report any hits found.
113116
By default this is done using BLASTp but we recommend using a proteome FASTA
114-
file via the `--peptide-fasta` parameter to speed up this step.
117+
file via the `--peptide-fasta` parameter to speed up this step. This will trigger
118+
a much faster k-mer based search strategy.
115119
- `--pass-only`: By default, all variants that were called in the tumor sample
116120
are considered by pVACseq. This flag will lead pVACseq to skip variants that
117121
have a FILTER applied in the VCF to, e.g., exclude variants that were marked
118122
as low quality by the variant caller.
119123
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
120124
for filtering and prioritizing neoantigen candidates, by default only the
121-
IC50 value is being used. Setting this parameter will additional also filter
125+
IC50 value is being used. Setting this parameter will additionally also filter
122126
on the predicted percentile. We recommend a value of 0.01 (1%) for this
123127
threshold.
124128

@@ -143,7 +147,7 @@ on your specific analysis needs:
143147
- `--downstream-sequence-length`: For frameshift variants, the downstream
144148
sequence can potentially be very long, which can be computationally
145149
expensive. This parameter limits how many amino acids of the downstream
146-
sequence are included in the prediction.
150+
sequence are included in the prediction. We often set a limit of `100`.
147151

148152
There are additional parameters in pVACseq that we won't discuss at this point
149153
because the defaults are usually sufficient. To see all available parameters, you can
@@ -154,9 +158,13 @@ run `pvacseq run -h`.
154158
Given the considerations outlined above, let's run pVACseq on our sample data.
155159

156160
From the `optitype_normal_result.tsv` we know that the patient's class I alleles are
157-
HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02. We also have clinical typing
158-
information that confirms these class I alleles as well as identified DQA1\*03:03,
159-
DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles.
161+
HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02 (indicated that two of three class I
162+
alleles are homozygous in this sample).We also have clinical typing information that confirms
163+
these class I alleles as well as identifying DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the
164+
patient's class II alleles.
165+
166+
Note that where needed pVACseq will automatically create HLA class II dimer combinations using
167+
valid class II allele pairings.
160168

161169
To identify the tumor and normal sample names we will grep the VCF file for
162170
the CHROM header:
@@ -230,7 +238,7 @@ usually apply. Here are a list of parameters we generally recommend:
230238

231239
- `--starfusion-file`: Path to a `star-fusion.fusion_predictions.tsv` or
232240
`star-fusion.fusion_predictions.abridged.tsv`. This file is used to extract
233-
read support and expression information.
241+
read support and expression information for each predicted fusion.
234242
- `--iedb-install-directory`: For speed and reliability, we generally recommend
235243
that users use a standalone installation of the IEDB software. The pVACtools
236244
Docker containers already come with this software pre-installed in the
@@ -250,7 +258,7 @@ usually apply. Here are a list of parameters we generally recommend:
250258
file via the `--peptide-fasta` parameter to speed up this step.
251259
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
252260
for filtering and prioritizing neoantigen candidates, by default only the
253-
IC50 value is being used. Setting this parameter will additional also filter
261+
IC50 value is being used. Setting this parameter will additionally also filter
254262
on the predicted percentile. We recommend a value of 0.01 (1%) for this
255263
threshold.
256264

@@ -272,7 +280,7 @@ on your specific analysis needs:
272280
- `--downstream-sequence-length`: For frameshift fusions, the downstream
273281
sequence can potentially be very long, which can be computationally
274282
expensive. This parameter limits how many amino acids of the downstream
275-
sequence are included in the prediction.
283+
sequence are included in the prediction. We often set a limit of `100`.
276284

277285
### pVACfuse Command
278286

05-pvacview_tour.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ This chapter will cover:
1414

1515
## Introduction to the pVACview module
1616

17-
pVACview is a R shiny based tool designed to aid specifically in the prioritization and selection of neoantigen candidates for personalized cancer vaccines. It takes as inputs a pVACseq output aggregate report file (tsv format) and a corresponding pVACseq output metrics file (json). pVACview allows the user to launch an R shiny application to load and visualize the given neoantigen candidates with detailed information including that of the genomic variant, transcripts covering the variant, and good-binding peptides predicted from the respective transcripts. It also incorporates anchor prediction data for a range of class I HLA alleles and peptides ranging from 8 to 11-mers. By taking all levels of information into account for the neoantigen candidates, clinicians will be able to make more informed decisions when deciding final peptide candidates for personalized cancer vaccines.
17+
pVACview is a R shiny based tool designed to aid specifically in the prioritization and selection of neoantigen candidates for personalized cancer vaccines or other applications. It takes as inputs a pVACseq output aggregate report file (tsv format) and a corresponding pVACseq output metrics file (json). pVACview allows the user to launch an R shiny application to load and visualize the given neoantigen candidates with detailed information including that of the genomic variant, transcripts covering the variant, and strong-binding peptides predicted from the respective transcripts. It also incorporates anchor prediction data for a range of class I HLA alleles and peptides ranging from 8- to 11-mers. By taking all these types of information into account for the neoantigen candidates, researchers will be able to make more informed decisions when deciding final peptide candidates for experiments, personalized cancer vaccines, or T cell therapies designed to target neoantigens.
1818

1919
```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Upon successfully uploading the relevant data files, you can explore the different aspects of your neoantigen candidates."}
2020
ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g2491f283519_0_8")

0 commit comments

Comments
 (0)