Skip to content

Commit c2acf17

Browse files
Add section on how to run pVACfuse
1 parent dfd1554 commit c2acf17

File tree

5 files changed

+146
-15
lines changed

5 files changed

+146
-15
lines changed

03-running_pvactools.Rmd

Lines changed: 130 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ This chapter will cover:
1212
- Starting an interactive Docker session
1313
- Running pVACseq
1414
- Running pVACfuse
15-
- Understanding pVACtools outputs
1615

1716
## Starting Docker
1817

@@ -40,10 +39,16 @@ to it once you exit the Docker image.
4039

4140
## Running pVACseq
4241

43-
The pVACseq pipeline is run using the `pvacseq run` command.
42+
pVACseq is used to identify neoantigens from missense, inframe indel, and
43+
frameshift mutations. The pipeline uses a somatic VCF file as an input, which
44+
represents variants called in the tumor sample. The VEP annoations in the VCF file
45+
inform the variant type of a variant and their consequence on the gene transcripts
46+
overlapping the genomic coodinates of the variant. The amino acid change of
47+
the predicted consequence if used by pVACseq to calculate the mutated peptide sequence.
4448

49+
The pVACseq pipeline is run using the `pvacseq run` command.
4550

46-
### Required Parameters
51+
### Required Parameters for pVACseq
4752

4853
The `pvacseq run` command takes a number of required parameters in the
4954
following order:
@@ -65,7 +70,7 @@ following order:
6570
run all available prediction algorithms.
6671
- `output_dir`: The directory for writing all result files.
6772

68-
### Optional Parameters
73+
### Optional Parameters for pVACseq
6974

7075
The `pvacseq run` command offers quite a few optional arguments to fine-tune
7176
your run. Here are a list of parameters we generally recommend:
@@ -122,7 +127,7 @@ on your specific analysis needs:
122127

123128
- `--class-i-epitope-length` and `--class-ii-epitope-length`: By default 8,
124129
9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters,
125-
respecitively but different lengths might be desired.
130+
respectively, but different lengths might be desired.
126131
- `--tumor-purity`: This parameter is used to bin variants into clonal and
127132
sub-clonal. This parameter might need to be adjusted based on the tumor
128133
purity of your data.
@@ -140,15 +145,18 @@ on your specific analysis needs:
140145
expensive. This parameter limits how many amino acids of the downstream
141146
sequence are included in the prediction.
142147

148+
There are additional parameters in pVACseq that we won't discuss at this point
149+
because the defaults are usually sufficient. To see all available parameters, you can
150+
run `pvacseq run -h`.
151+
143152
### pVACseq Command
144153

145154
Given the considerations outlined above, let's run pVACseq on our sample data.
146155

147-
From the
148-
`optitype_normal_result.tsv` we know that the patient's class I alleles are HLA-A\*29:02, HLA-B\*45:01,
149-
HLA-B\*82:02, and HLA-C\*06:02. We also have clinical typing information that confirms
150-
these class I alleles as well as identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the
151-
patient's class II alleles.
156+
From the `optitype_normal_result.tsv` we know that the patient's class I alleles are
157+
HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02. We also have clinical typing
158+
information that confirms these class I alleles as well as identified DQA1\*03:03,
159+
DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles.
152160

153161
To identify the tumor and normal sample names we will grep the VCF file for
154162
the CHROM header:
@@ -161,7 +169,7 @@ This shows that the tumor sample is named `HCC1395_TUMOR_DNA` and the normal sam
161169

162170
For our test run, please execute the `pvacseq run` command below. The
163171
prediction run might take a while but pVACseq will output progress messages as
164-
it processeses through the pipeline.
172+
it runs through the pipeline.
165173

166174
```{r, engine = 'bash', eval = FALSE}
167175
pvacseq run \
@@ -187,8 +195,117 @@ all \
187195

188196
## Running pVACfuse
189197

190-
## Understanding pVACtools outputs
198+
pVACfuse is run to in order to predict neoantigens from fusion events. The
199+
pipeline uses annotated fusion calls from eithe AGFusion or Arriba for this
200+
purpose. These annotators already include the fusion peptide sequence in their
201+
outputs which pVACfuse uses to extract neoantigens around the fusion position.
191202

192-
This section will review pVACtools outputs and explain how to correctly interpret them.
203+
The pVACfuse pipeline is run using the `pvacfuse run` command.
193204

205+
### Required Parameters for pVACfuse
206+
207+
The `pvacfuse run` command takes a number of required parameters in the
208+
following order:
194209

210+
- `input_file`: An AGFusion output directory or Arriba fusion.tsv output file.
211+
For the purpose of this course, we will be running pVACfuse with AGFusion
212+
output.
213+
- `sample_name`: The name of the tumor sample being processed.
214+
- `allele(s)`: The name of the HLA allele to use for epitope prediction. Multiple
215+
alleles can be specified using a comma-separated list. These should be the
216+
HLA alleles of your patient. You might have clinical typing information for
217+
your patient. If not, you will need to computational predict the patient's
218+
HLA type using software such as OptiType.
219+
- `prediction_algorithms`: The epitope prediction algorithms to use. Multiple
220+
prediction algorithms can be specified, separated by spaces. Use `all` to
221+
run all available prediction algorithms.
222+
- `output_dir`: The directory for writing all result files.
223+
224+
### Optional Parameters for pVACfuse
225+
226+
In addition to the required parameters, the `pvacseq run` command also offers
227+
optional arguments to fine-tune your run. You will find a lot of overlap
228+
between pVACfuse and pVACseq parameters and the same general considerations
229+
usually apply. Here are a list of parameters we generally recommend:
230+
231+
- `--starfusion-file`: Path to a `star-fusion.fusion_predictions.tsv` or
232+
`star-fusion.fusion_predictions.abridged.tsv`. This file is used to extract
233+
read support and expression information.
234+
- `--iedb-install-directory`: For speed and reliability, we generally recommend
235+
that users use a standalone installation of the IEDB software. The pVACtools
236+
Docker containers already come with this software pre-installed in the
237+
`/opt/iedb` directory.
238+
- `--allele-specific-binding-thresholds`: When filtering and tiering
239+
neoantigen candidates, one main criteria is the predicted peptide-MHC
240+
binding affinity. By default, pVACfuse uses a cutoff of <500 nmol IC50.
241+
However, for some HLA alleles, other cutoffs are more appropriate depending
242+
on the distribution of binding affinities across peptides. Setting
243+
this flag enables allele-specific binding cutoffs as recommended by
244+
[IEDB](https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions).
245+
- `--run-reference-proteome-similarity`: One consideration when selecting
246+
neoantigen candidates, is that the neoantigen should not occur natively in
247+
the patient's proteome. When this flag is set, pVACfuse will search for each
248+
neoantigen candidate in the reference proteome and report any hits found.
249+
By default this is done using BLASTp but we recommend using a proteome FASTA
250+
file via the `--peptide-fasta` parameter to speed up this step.
251+
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
252+
for filtering and prioritizing neoantigen candidates, by default only the
253+
IC50 value is being used. Setting this parameter will additional also filter
254+
on the predicted percentile. We recommend a value of 0.01 (1%) for this
255+
threshold.
256+
257+
Additionally there are a number of parameters that might be useful depending
258+
on your specific analysis needs:
259+
260+
- `--class-i-epitope-length` and `--class-ii-epitope-length`: By default 8,
261+
9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters,
262+
respectively, but different lengths might be desired.
263+
- `--problematic-amino-acids`: Some vaccine manufacturers will consider certain amino
264+
acids in the neoantigen candidates difficult to manufacture. For example, a
265+
Cysteine is commonly considered problematic as it makes the peptide
266+
unstable. This parameter allows users to set their own rules as to which
267+
peptides are considered problematic and peptides meeting those rules will be marked in the
268+
pVACseq results and deprioritized.
269+
- `--threads`: This argument will allow pVACfuse to run in multi-processing
270+
mode.
271+
- `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACfuse.
272+
- `--downstream-sequence-length`: For frameshift fusions, the downstream
273+
sequence can potentially be very long, which can be computationally
274+
expensive. This parameter limits how many amino acids of the downstream
275+
sequence are included in the prediction.
276+
277+
### pVACfuse Command
278+
279+
Given the considerations outlined above, let's run pVACfuse on our sample data.
280+
281+
As with pVACseq, we can use the `optitype_normal_result.tsv` file to identify the patient's
282+
class I HLA alleles. These are HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02.
283+
We also have clinical typing information that confirms these class I alleles as well as
284+
identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles.
285+
286+
For pVACfuse the sample name is not used for any parsing so it doesn't need to
287+
match any specific information in the AGFusion results. It is only used for
288+
naming result files. For consistency we will use the same `HCC1395_TUMOR_DNA`
289+
sample name we used in pVACfuse.
290+
291+
For our test run, please execute the `pvacfuse run` command below. The
292+
prediction run might take a while but pVACfuse will output progress messages as
293+
it runs through the pipeline.
294+
295+
```{r, engine = 'bash', eval = FALSE}
296+
pvacfuse run \
297+
/HCC1395_inputs/agfusion_results \
298+
HCC1395_TUMOR_DNA \
299+
HLA-A*29:02,HLA-B*45:01,HLA-B*82:02,HLA-C*06:02,DQA1*03:03,DQB1*03:02,DRB1*04:05 \
300+
all \
301+
/pVACtools_outputs/pvacfuse_predictions \
302+
--iedb-install-directory /opt/iedb \
303+
--allele-specific-binding-thresholds \
304+
--percentile-threshold 0.01 \
305+
--run-reference-proteome-similarity \
306+
--peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \
307+
--problematic-amino-acids C \
308+
--downstream-sequence-length 100 \
309+
--n-threads 8 \
310+
--keep-tmp-files
311+
```

04-outputs.Rmd

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Understanding pVACtools outputs
2+
3+
```{r, include = FALSE}
4+
ottrpal::set_knitr_image_path()
5+
```
6+
7+
## Learning Objectives
8+
9+
This chapter will cover:
10+
11+
- Understanding the output files produced by pVACtools
12+
- Interpreting the .filtered.tsv file
13+
- Interpreting the .aggregated.tsv file
File renamed without changes.
File renamed without changes.

_bookdown.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@ rmd_files: ["index.Rmd",
55
"01-intro.Rmd",
66
"02-prerequisites.Rmd",
77
"03-running_pvactools.Rmd",
8-
"04-pvacview_tour.Rmd",
9-
"05-conclusions.Rmd",
8+
"04-outputs.Rmd",
9+
"05-pvacview_tour.Rmd",
10+
"06-conclusions.Rmd",
1011
"About.Rmd",
1112
"References.Rmd"]
1213
new_session: yes

0 commit comments

Comments
 (0)