Skip to content

Commit dfd1554

Browse files
Add section for running pVACseq
1 parent 5c9518e commit dfd1554

File tree

4 files changed

+240
-30
lines changed

4 files changed

+240
-30
lines changed

02-prerequisites.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ General:
9090
- `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use
9191
for determining whether there are any reference matches of neoantigen candidates
9292

93-
To download this data, please run the following command:
93+
To download this data, please run the following commands:
9494

9595
```{r, engine = 'bash', eval = FALSE}
9696
wget https://raw.githubusercontent.com/griffithlab/pVACtools_Intro_Course/main/HCC1395_inputs.zip
@@ -99,4 +99,4 @@ unzip HCC1395_inputs.zip
9999

100100
This course will not cover the required pre-processing steps for the pVACtools
101101
input data but extensive instructions on how to prepare your own data for use
102-
with pVACtools can be found at[pvactools.org](https://www.pvactools.org)
102+
with pVACtools can be found at [pvactools.org](https://www.pvactools.org)

03-running_pvactools.Rmd

Lines changed: 174 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,183 @@ ottrpal::set_knitr_image_path()
99

1010
This chapter will cover:
1111

12-
- Running pVACtools
12+
- Starting an interactive Docker session
13+
- Running pVACseq
14+
- Running pVACfuse
1315
- Understanding pVACtools outputs
1416

15-
## Running pVACtools
17+
## Starting Docker
1618

17-
This section will explain how to run pVACtools either using Docker.
19+
In your Terminal execute the following command:
20+
21+
```{r, engine = 'bash', eval = FALSE}
22+
mkdir pVACtools_outputs
23+
24+
docker run \
25+
-v HCC1395_inputs:/HCC1395_inputs \
26+
-v pVACtools_outputs:/pVACtools_outputs \
27+
-it griffithlab/pvactools:4.0.0 \
28+
/bin/bash
29+
```
30+
31+
This will pull the 4.0.0 version of the griffithlab/pvactools Docker image and
32+
start an interactive session of that Docker image. The `-v
33+
HCC1395_inputs:/HCC1395_inputs` part of the command will mount the
34+
`HCC1395_inputs` folder at `/HCC1395_inputs` inside of the Docker container
35+
so that you will have access to the input data from inside the Docker
36+
container. The `-v pVACtools_outputs:/pVACtools_outputs` part of the command
37+
will mount the `pVACtools_outputs` folder you just created. We will write the
38+
outputs from pVACseq and pVACfuse to that folder so that you will have access
39+
to it once you exit the Docker image.
40+
41+
## Running pVACseq
42+
43+
The pVACseq pipeline is run using the `pvacseq run` command.
44+
45+
46+
### Required Parameters
47+
48+
The `pvacseq run` command takes a number of required parameters in the
49+
following order:
50+
51+
- `vcf_file`: A VEP-annotated single- or multi-sample VCF containing genotype,
52+
transcript, Wildtype protein sequence, and Frameshift protein sequence
53+
information.
54+
- `sample_name`: The name of the tumor sample being processed. When processing
55+
a multi-sample VCF the sample name must be a sample ID in the input VCF #CHROM
56+
header line. Only variants that are called (genotype/GT 0/1 or 1/1) in that
57+
sample will be processed.
58+
- `allele(s)`: The name of the HLA allele to use for epitope prediction. Multiple
59+
alleles can be specified using a comma-separated list. These should be the
60+
HLA alleles of your patient. You might have clinical typing information for
61+
your patient. If not, you will need to computational predict the patient's
62+
HLA type using software such as OptiType.
63+
- `prediction_algorithms`: The epitope prediction algorithms to use. Multiple
64+
prediction algorithms can be specified, separated by spaces. Use `all` to
65+
run all available prediction algorithms.
66+
- `output_dir`: The directory for writing all result files.
67+
68+
### Optional Parameters
69+
70+
The `pvacseq run` command offers quite a few optional arguments to fine-tune
71+
your run. Here are a list of parameters we generally recommend:
72+
73+
- `--phased-proximal-variants-vcf`: This is an additional VCF file that
74+
includes both somatic and germline variants with phasing information. This
75+
file is used to identify variants near a somatic variant of interest and
76+
in-phase that would, as a result, change the predicted protein sequence
77+
around the somatic variant of interest and, thus, change the predicted
78+
neoantigens. Please note that pVACseq is currently only able to incorporate
79+
proximal missense variants so users should still manually investigate their
80+
candidates for other types of nearby variants (e.g. inframe and frameshift
81+
indels)
82+
- `--normal-sample-name`: When using a tumor-normal input VCF, this parameter
83+
is used to identify the normal sample in the VCF in order to parse
84+
coverage metrics for the normal sample.
85+
- `--iedb-install-directory`: For speed and reliability, we generally recommend
86+
that users use a standalone installation of the IEDB software. The pVACtools
87+
Docker containers already come with this software pre-installed in the
88+
`/opt/iedb` directory.
89+
- `--allele-specific-binding-thresholds`: When filtering and tiering
90+
neoantigen candidates, one main criteria is the predicted peptide-MHC
91+
binding affinity. By default, pVACseq uses a cutoff of <500 nmol IC50.
92+
However, for some HLA alleles, other cutoffs are more appropriate depending
93+
on the distribution of binding affinities across peptides. Setting
94+
this flag enables allele-specific binding cutoffs as recommended by
95+
[IEDB](https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions).
96+
- `--allele-specific-anchors`: When considering a neoantigen candidate, only a
97+
subset of peptide positions are presented to the T cell receptor
98+
for recognition, while others are responsible for anchoring to the MHC, making
99+
these positional considerations critical for predicting T cell responses.
100+
Conventionally, the 1st, 2nd, n-1 and n position in a neoantigen candidates
101+
were considered anchors while recent studies [@Xia2023] have shown that
102+
these positions will depend on the HLA allele. Setting this flag will use
103+
allele-specific anchor locations.
104+
- `--run-reference-proteome-similarity`: One consideration when selecting
105+
neoantigen candidates, is that the neoantigen should not occur natively in
106+
the patient's proteome. When this flag is set, pVACseq will search for each
107+
neoantigen candidate in the reference proteome and report any hits found.
108+
By default this is done using BLASTp but we recommend using a proteome FASTA
109+
file via the `--peptide-fasta` parameter to speed up this step.
110+
- `--pass-only`: By default, all variants that were called in the tumor sample
111+
are considered by pVACseq. This flag will lead pVACseq to skip variants that
112+
have a FILTER applied in the VCF to, e.g., exclude variants that were marked
113+
as low quality by the variant caller.
114+
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
115+
for filtering and prioritizing neoantigen candidates, by default only the
116+
IC50 value is being used. Setting this parameter will additional also filter
117+
on the predicted percentile. We recommend a value of 0.01 (1%) for this
118+
threshold.
119+
120+
Additionally there are a number of parameters that might be useful depending
121+
on your specific analysis needs:
122+
123+
- `--class-i-epitope-length` and `--class-ii-epitope-length`: By default 8,
124+
9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters,
125+
respecitively but different lengths might be desired.
126+
- `--tumor-purity`: This parameter is used to bin variants into clonal and
127+
sub-clonal. This parameter might need to be adjusted based on the tumor
128+
purity of your data.
129+
- `--problematic-amino-acids`: Some vaccine manufacturers will consider certain amino
130+
acids in the neoantigen candidates difficult to manufacture. For example, a
131+
Cysteine is commonly considered problematic as it makes the peptide
132+
unstable. This parameter allows users to set their own rules as to which
133+
peptides are considered problematic and peptides meeting those rules will be marked in the
134+
pVACseq results and deprioritized.
135+
- `--threads`: This argument will allow pVACseq to run in multi-processing
136+
mode.
137+
- `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACseq.
138+
- `--downstream-sequence-length`: For frameshift variants, the downstream
139+
sequence can potentially be very long, which can be computationally
140+
expensive. This parameter limits how many amino acids of the downstream
141+
sequence are included in the prediction.
142+
143+
### pVACseq Command
144+
145+
Given the considerations outlined above, let's run pVACseq on our sample data.
146+
147+
From the
148+
`optitype_normal_result.tsv` we know that the patient's class I alleles are HLA-A\*29:02, HLA-B\*45:01,
149+
HLA-B\*82:02, and HLA-C\*06:02. We also have clinical typing information that confirms
150+
these class I alleles as well as identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the
151+
patient's class II alleles.
152+
153+
To identify the tumor and normal sample names we will grep the VCF file for
154+
the CHROM header:
155+
156+
```{r, engine = 'bash', eval = FALSE}
157+
zgrep CHROM /HCC1395_inputs/annotated.expression.vcf.gz
158+
```
159+
160+
This shows that the tumor sample is named `HCC1395_TUMOR_DNA` and the normal sample is named `HCC1395_NORMAL_DNA`.
161+
162+
For our test run, please execute the `pvacseq run` command below. The
163+
prediction run might take a while but pVACseq will output progress messages as
164+
it processeses through the pipeline.
165+
166+
```{r, engine = 'bash', eval = FALSE}
167+
pvacseq run \
168+
/HCC1395_inputs/annotated.expression.vcf.gz \
169+
HCC1395_TUMOR_DNA \
170+
HLA-A*29:02,HLA-B*45:01,HLA-B*82:02,HLA-C*06:02,DQA1*03:03,DQB1*03:02,DRB1*04:05 \
171+
all \
172+
/pVACtools_outputs/pvacseq_predictions \
173+
--normal-sample-name HCC1395_NORMAL_DNA \
174+
--phased-proximal-variants-vcf /HCC1395_inputs/phased.vcf.gz \
175+
--iedb-install-directory /opt/iedb \
176+
--pass-only \
177+
--allele-specific-binding-thresholds \
178+
--percentile-threshold 0.01 \
179+
--allele-specific-anchors \
180+
--run-reference-proteome-similarity \
181+
--peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \
182+
--problematic-amino-acids C \
183+
--downstream-sequence-length 100 \
184+
--n-threads 8 \
185+
--keep-tmp-files
186+
```
187+
188+
## Running pVACfuse
18189

19190
## Understanding pVACtools outputs
20191

book.bib

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,19 @@ @article{Keskin2018
5353
journal = {Nature}
5454
}
5555

56+
@article{Xia2023,
57+
doi = {10.1126/sciimmunol.abg2200},
58+
url = {https://doi.org/10.1126/sciimmunol.abg2200},
59+
year = {2023},
60+
month = apr,
61+
publisher = {American Association for the Advancement of Science ({AAAS})},
62+
volume = {8},
63+
number = {82},
64+
author = {Huiming Xia and Joshua McMichael and Michelle Becker-Hapak and Onyinyechi C. Onyeador and Rico Buchli and Ethan McClain and Patrick Pence and Suangson Supabphol and Megan M. Richters and Anamika Basu and Cody A. Ramirez and Cristina Puig-Saus and Kelsy C. Cotto and Sharon L. Freshour and Jasreet Hundal and Susanna Kiwala and S. Peter Goedegebuure and Tanner M. Johanns and Gavin P. Dunn and Antoni Ribas and Christopher A. Miller and William E. Gillanders and Todd A. Fehniger and Obi L. Griffith and Malachi Griffith},
65+
title = {Computational prediction of {MHC} anchor locations guides neoantigen identification and prioritization},
66+
journal = {Science Immunology}
67+
}
68+
5669
@article{Ott2017,
5770
doi = {10.1038/nature22991},
5871
url = {https://doi.org/10.1038/nature22991},

resources/dictionary.txt

Lines changed: 51 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,31 @@
1+
AGFusion
2+
Arriba
13
AnVIL
24
BIPOC
5+
BLASTp
36
Bloomberg
47
Bookdown
8+
bioinformatics
9+
CHROM
10+
CLI
511
ClinVar
612
Coursera
13+
Cysteine
14+
clonality
715
css
16+
cytotoxic
17+
DQA
18+
DQB
19+
DRB
820
Datatrail
921
DataTrail
1022
Dockerfile
1123
Dockerhub
1224
dropdown
25+
epitope
1326
epitopes
27+
Ensembl
28+
FASTA
1429
favicon
1530
frameshift
1631
fyi
@@ -19,29 +34,55 @@ GenBank
1934
GH
2035
GitHub
2136
Github
37+
germline
2238
gnomAD
39+
griffithlab
40+
HCC
2341
HLA
42+
histocompatibility
2443
https
44+
IC
45+
IEDB
46+
ITCR
47+
ITN
48+
immunotherapies
49+
immunotherapy
50+
isoform
2551
immunogenomics
2652
impactful
27-
ITCR
53+
indels
54+
inframe
2855
itcrtraining
29-
ITN
3056
json
3157
junctional
3258
Leanpub
59+
MHCnuggets
3360
Markua
61+
mRNA
62+
manufacturability
3463
mentorship
3564
mers
65+
missense
3666
MHC
3767
MHCflurry
3868
NCI
69+
NHGRI
70+
NetChop
71+
NetMHCpan
72+
NetMHCstabpan
73+
natively
74+
nd
3975
neoantigen
4076
Neoantigen
4177
neoantigens
42-
NHGRI
78+
nmol
79+
OptiType
4380
ottrpal
81+
PHLAT
82+
pVACbind
83+
proteome
4484
Pandoc
85+
pre
4586
proteomics
4687
pVAC
4788
pVACfuse
@@ -53,32 +94,17 @@ pVACview
5394
pVACviz
5495
RefSeq
5596
reproducibility
97+
somatically
5698
subclonal
99+
STARFusion
100+
tbi
101+
tiering
57102
tsv
58103
UE
59104
UE5
60105
underserved
61-
www
62-
AGfusion
63-
Arriba
64-
clonality
65-
cytotoxic
66-
Ensembl
67-
histocompatibility
68-
IEDB
69-
immunotherapies
70-
immunotherapy
71-
isoform
72-
manufacturability
73-
MHCnuggets
74-
mRNA
75-
NetChop
76-
NetMHCpan
77-
NetMHCstabpan
78-
OptiType
79-
PHLAT
80-
proteome
81-
pVACbind
82-
somatically
106+
VCF
83107
vaxrank
84108
VEP
109+
www
110+
Wildtype

0 commit comments

Comments
 (0)