@@ -9,12 +9,183 @@ ottrpal::set_knitr_image_path()
9
9
10
10
This chapter will cover:
11
11
12
- - Running pVACtools
12
+ - Starting an interactive Docker session
13
+ - Running pVACseq
14
+ - Running pVACfuse
13
15
- Understanding pVACtools outputs
14
16
15
- ## Running pVACtools
17
+ ## Starting Docker
16
18
17
- This section will explain how to run pVACtools either using Docker.
19
+ In your Terminal execute the following command:
20
+
21
+ ``` {r, engine = 'bash', eval = FALSE}
22
+ mkdir pVACtools_outputs
23
+
24
+ docker run \
25
+ -v HCC1395_inputs:/HCC1395_inputs \
26
+ -v pVACtools_outputs:/pVACtools_outputs \
27
+ -it griffithlab/pvactools:4.0.0 \
28
+ /bin/bash
29
+ ```
30
+
31
+ This will pull the 4.0.0 version of the griffithlab/pvactools Docker image and
32
+ start an interactive session of that Docker image. The `-v
33
+ HCC1395_inputs:/HCC1395_inputs` part of the command will mount the
34
+ ` HCC1395_inputs ` folder at ` /HCC1395_inputs ` inside of the Docker container
35
+ so that you will have access to the input data from inside the Docker
36
+ container. The ` -v pVACtools_outputs:/pVACtools_outputs ` part of the command
37
+ will mount the ` pVACtools_outputs ` folder you just created. We will write the
38
+ outputs from pVACseq and pVACfuse to that folder so that you will have access
39
+ to it once you exit the Docker image.
40
+
41
+ ## Running pVACseq
42
+
43
+ The pVACseq pipeline is run using the ` pvacseq run ` command.
44
+
45
+
46
+ ### Required Parameters
47
+
48
+ The ` pvacseq run ` command takes a number of required parameters in the
49
+ following order:
50
+
51
+ - ` vcf_file ` : A VEP-annotated single- or multi-sample VCF containing genotype,
52
+ transcript, Wildtype protein sequence, and Frameshift protein sequence
53
+ information.
54
+ - ` sample_name ` : The name of the tumor sample being processed. When processing
55
+ a multi-sample VCF the sample name must be a sample ID in the input VCF #CHROM
56
+ header line. Only variants that are called (genotype/GT 0/1 or 1/1) in that
57
+ sample will be processed.
58
+ - ` allele(s) ` : The name of the HLA allele to use for epitope prediction. Multiple
59
+ alleles can be specified using a comma-separated list. These should be the
60
+ HLA alleles of your patient. You might have clinical typing information for
61
+ your patient. If not, you will need to computational predict the patient's
62
+ HLA type using software such as OptiType.
63
+ - ` prediction_algorithms ` : The epitope prediction algorithms to use. Multiple
64
+ prediction algorithms can be specified, separated by spaces. Use ` all ` to
65
+ run all available prediction algorithms.
66
+ - ` output_dir ` : The directory for writing all result files.
67
+
68
+ ### Optional Parameters
69
+
70
+ The ` pvacseq run ` command offers quite a few optional arguments to fine-tune
71
+ your run. Here are a list of parameters we generally recommend:
72
+
73
+ - ` --phased-proximal-variants-vcf ` : This is an additional VCF file that
74
+ includes both somatic and germline variants with phasing information. This
75
+ file is used to identify variants near a somatic variant of interest and
76
+ in-phase that would, as a result, change the predicted protein sequence
77
+ around the somatic variant of interest and, thus, change the predicted
78
+ neoantigens. Please note that pVACseq is currently only able to incorporate
79
+ proximal missense variants so users should still manually investigate their
80
+ candidates for other types of nearby variants (e.g. inframe and frameshift
81
+ indels)
82
+ - ` --normal-sample-name ` : When using a tumor-normal input VCF, this parameter
83
+ is used to identify the normal sample in the VCF in order to parse
84
+ coverage metrics for the normal sample.
85
+ - ` --iedb-install-directory ` : For speed and reliability, we generally recommend
86
+ that users use a standalone installation of the IEDB software. The pVACtools
87
+ Docker containers already come with this software pre-installed in the
88
+ ` /opt/iedb ` directory.
89
+ - ` --allele-specific-binding-thresholds ` : When filtering and tiering
90
+ neoantigen candidates, one main criteria is the predicted peptide-MHC
91
+ binding affinity. By default, pVACseq uses a cutoff of <500 nmol IC50.
92
+ However, for some HLA alleles, other cutoffs are more appropriate depending
93
+ on the distribution of binding affinities across peptides. Setting
94
+ this flag enables allele-specific binding cutoffs as recommended by
95
+ [ IEDB] ( https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions ) .
96
+ - ` --allele-specific-anchors ` : When considering a neoantigen candidate, only a
97
+ subset of peptide positions are presented to the T cell receptor
98
+ for recognition, while others are responsible for anchoring to the MHC, making
99
+ these positional considerations critical for predicting T cell responses.
100
+ Conventionally, the 1st, 2nd, n-1 and n position in a neoantigen candidates
101
+ were considered anchors while recent studies [ @Xia2023 ] have shown that
102
+ these positions will depend on the HLA allele. Setting this flag will use
103
+ allele-specific anchor locations.
104
+ - ` --run-reference-proteome-similarity ` : One consideration when selecting
105
+ neoantigen candidates, is that the neoantigen should not occur natively in
106
+ the patient's proteome. When this flag is set, pVACseq will search for each
107
+ neoantigen candidate in the reference proteome and report any hits found.
108
+ By default this is done using BLASTp but we recommend using a proteome FASTA
109
+ file via the ` --peptide-fasta ` parameter to speed up this step.
110
+ - ` --pass-only ` : By default, all variants that were called in the tumor sample
111
+ are considered by pVACseq. This flag will lead pVACseq to skip variants that
112
+ have a FILTER applied in the VCF to, e.g., exclude variants that were marked
113
+ as low quality by the variant caller.
114
+ - ` --percentile-threshold ` : When considering the peptide-MHC binding affinity
115
+ for filtering and prioritizing neoantigen candidates, by default only the
116
+ IC50 value is being used. Setting this parameter will additional also filter
117
+ on the predicted percentile. We recommend a value of 0.01 (1%) for this
118
+ threshold.
119
+
120
+ Additionally there are a number of parameters that might be useful depending
121
+ on your specific analysis needs:
122
+
123
+ - ` --class-i-epitope-length ` and ` --class-ii-epitope-length ` : By default 8,
124
+ 9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters,
125
+ respecitively but different lengths might be desired.
126
+ - ` --tumor-purity ` : This parameter is used to bin variants into clonal and
127
+ sub-clonal. This parameter might need to be adjusted based on the tumor
128
+ purity of your data.
129
+ - ` --problematic-amino-acids ` : Some vaccine manufacturers will consider certain amino
130
+ acids in the neoantigen candidates difficult to manufacture. For example, a
131
+ Cysteine is commonly considered problematic as it makes the peptide
132
+ unstable. This parameter allows users to set their own rules as to which
133
+ peptides are considered problematic and peptides meeting those rules will be marked in the
134
+ pVACseq results and deprioritized.
135
+ - ` --threads ` : This argument will allow pVACseq to run in multi-processing
136
+ mode.
137
+ - ` --keep-tmp-files ` : Setting this flag will save intermediate files created by pVACseq.
138
+ - ` --downstream-sequence-length ` : For frameshift variants, the downstream
139
+ sequence can potentially be very long, which can be computationally
140
+ expensive. This parameter limits how many amino acids of the downstream
141
+ sequence are included in the prediction.
142
+
143
+ ### pVACseq Command
144
+
145
+ Given the considerations outlined above, let's run pVACseq on our sample data.
146
+
147
+ From the
148
+ ` optitype_normal_result.tsv ` we know that the patient's class I alleles are HLA-A\* 29:02, HLA-B\* 45:01,
149
+ HLA-B\* 82:02, and HLA-C\* 06:02. We also have clinical typing information that confirms
150
+ these class I alleles as well as identified DQA1\* 03:03, DQB1\* 03:02, and DRB1\* 04:05 as the
151
+ patient's class II alleles.
152
+
153
+ To identify the tumor and normal sample names we will grep the VCF file for
154
+ the CHROM header:
155
+
156
+ ``` {r, engine = 'bash', eval = FALSE}
157
+ zgrep CHROM /HCC1395_inputs/annotated.expression.vcf.gz
158
+ ```
159
+
160
+ This shows that the tumor sample is named ` HCC1395_TUMOR_DNA ` and the normal sample is named ` HCC1395_NORMAL_DNA ` .
161
+
162
+ For our test run, please execute the ` pvacseq run ` command below. The
163
+ prediction run might take a while but pVACseq will output progress messages as
164
+ it processeses through the pipeline.
165
+
166
+ ``` {r, engine = 'bash', eval = FALSE}
167
+ pvacseq run \
168
+ /HCC1395_inputs/annotated.expression.vcf.gz \
169
+ HCC1395_TUMOR_DNA \
170
+ HLA-A*29:02,HLA-B*45:01,HLA-B*82:02,HLA-C*06:02,DQA1*03:03,DQB1*03:02,DRB1*04:05 \
171
+ all \
172
+ /pVACtools_outputs/pvacseq_predictions \
173
+ --normal-sample-name HCC1395_NORMAL_DNA \
174
+ --phased-proximal-variants-vcf /HCC1395_inputs/phased.vcf.gz \
175
+ --iedb-install-directory /opt/iedb \
176
+ --pass-only \
177
+ --allele-specific-binding-thresholds \
178
+ --percentile-threshold 0.01 \
179
+ --allele-specific-anchors \
180
+ --run-reference-proteome-similarity \
181
+ --peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \
182
+ --problematic-amino-acids C \
183
+ --downstream-sequence-length 100 \
184
+ --n-threads 8 \
185
+ --keep-tmp-files
186
+ ```
187
+
188
+ ## Running pVACfuse
18
189
19
190
## Understanding pVACtools outputs
20
191
0 commit comments