@@ -11,3 +11,258 @@ This chapter will cover:
11
11
- Understanding the output files produced by pVACtools
12
12
- Interpreting the .filtered.tsv file
13
13
- Interpreting the .aggregated.tsv file
14
+
15
+ ## pVACtools Output Files
16
+
17
+ Both pVACseq and pVACfuse produce three main output files:
18
+
19
+ - The ` all_epitopes.tsv ` file is a TSV file with all predicted neoantigens and
20
+ all information obtained during the run.
21
+ - The ` filtered.tsv ` file is the same structure as the all_epitopes.tsv file
22
+ but the entries have been filtered down according to the thresholds set by
23
+ the user during the run. The filters will be further explained in
24
+ subsequent sections.
25
+ - The ` aggregated.tsv ` is a condensed output file that contains only the
26
+ information most pertinent to interpret the results. It has contains only
27
+ the best neoantigen candidate for each variant. Our heuristic for
28
+ determining the best neoantigen is described in subsequent sections of this
29
+ course.
30
+
31
+ There are also a number of a secondary output files produced by pVACseq and
32
+ pVACfuse. The most important are:
33
+
34
+ - ` aggregated.metrics.json ` : The file is only produced by pVACseq. It contains
35
+ metadata needed for visualizing your results in pVACview.
36
+ - ` aggregated.tsv.reference_matches ` : This file is created when the
37
+ reference proteome match feature is enabled during a run. It contains
38
+ detailed information about the reference matches found, if there are any.
39
+
40
+ ## Interpreting the filtered.tsv File
41
+
42
+ The filtered.tsv file takes all the predicted neoantigens from the
43
+ all_epitopes.tsv file and applies a number of filters to it. Filters are
44
+ applied consecutively, meaning that only the entries passing the first filter
45
+ will be passed along to the second filter, and so on. Only neoantigens
46
+ passing all filters will be reported in this file.
47
+
48
+ ### Binding Filter
49
+
50
+ The binding filter's primary function is to filter neoantigen candidates on
51
+ their IC50 binding affinity to an HLA allele. Because pVACtools allows users
52
+ to run more than one prediction algorithm, we then apply two summarization
53
+ methods on the calls for each neoantigen candidate and HLA allele combination:
54
+ (1) pVACtools calculates the median IC50 binding affinity for all selected prediction
55
+ algorithms (reported in the ` Median [MT] IC50 Score ` column), and (2) pVACtools selects
56
+ the IC50 binding affinity prediction with the lowest value (reported in the
57
+ ` Best [MT] IC50 Score) ` column. By default,
58
+ the binding filter is applied to the median IC50 score unless
59
+ users set the ` --top-score-metric ` parameters to ` lowest ` .
60
+
61
+ The binding filter discards candidates where the binding affinity is above the
62
+ ` --binding-threshold ` (default: 500). However, users may set the
63
+ ` --allele-specific-binding-thresholds ` flag in order to use differing binding
64
+ thresholds depending on the HLA allele of the prediction, as recommended by
65
+ [ IEDB] ( https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions ) .
66
+ Custom thresholds are available for the most common 76 class I HLA alleles.
67
+ For all others, the ` --binding-threshold ` value is used.
68
+
69
+ In addition to the binding affinity, other optional parameters can be set to
70
+ enabled additional filtering on related metrics:
71
+
72
+ - ` --minimum-fold-change ` : The fold change is the ratio of the mutant binding affinity to
73
+ the wild-type binding affinity, also called agretopicity. A fold change of 1
74
+ means that the mutant is a better binder than the wild type. pVACtools
75
+ calculates this ratio for both the median as well as the lowest values.
76
+ Which one is filtered on for this metric depends again on the
77
+ ` --top-score-metric ` set. When a minimum fold change parameter is set, the binding filter
78
+ discards any prediction with a agretopicity below the set cutoff. This
79
+ parameter is not available in pVACfuse because there is no matched wildtype
80
+ peptide for each neoantigen candidate.
81
+ - ` --percentile-threshold ` : The prediction algorithms supported by pVACtools
82
+ also report a percentile score that represents where each neoantigen's predicted
83
+ affinity falls in the range of other values for an HLA allele. Similar to
84
+ the binding affinity itself, pVACtools report the median and the lowest
85
+ percentile scores for the range of scores reported by the prediction
86
+ algorithms chosen by the user and which on is used for filtering is again
87
+ controlled by the ` --top-score-metric ` parameter.
88
+
89
+ ### Coverage Filter
90
+
91
+ The Coverage Filter is generally used to filter out variants that don't have
92
+ enough read support or expression. This ensures that the remaining variants
93
+ are not just artifacts and that the genes are actually expressed in the
94
+ patient's RNA.
95
+
96
+ For pVACseq, this generally relies on your VCF being annotated with coverage
97
+ and expression data. In our example, the VCF has already been annotated with
98
+ this data. For more information about how to add coverage and expression data
99
+ to your own VCFs, please see [ here] ( https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html )
100
+ and [ here] ( https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html ) .
101
+ Additionally, filtering on the normal DNA depth and variant allele frequency
102
+ (VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
103
+ to be identifies in your pVACseq run using the ` --normal-sample-name `
104
+ parameter. If a coverage metric doesn't apply because the underlying data is
105
+ not available, ` NA ` is reported by pVACtools. By default, the filter will skip
106
+ evaluating a coverage criteria when a neoantigen's value for it is ` NA ` .
107
+
108
+ The following thresholds are applied in pVACseq by this filter:
109
+
110
+ - ` --normal-cov ` : Normal coverage cutoff. Minimum number of required reads in the normal DNA (default: 5).
111
+ - ` --tdna-cov ` : Tumor DNA coverage cutoff. Minimum number of required reads in the tumor DNA (default: 10).
112
+ - ` --trna-cov ` : Tumor RNA coverage cutoff. Minimum number of required reads in the tumor RNA (default: 10).
113
+ - ` --normal-vaf ` : Normal VAF cutoff. Only sites BELOW this cutoff in the normal DNA will be considered (default: 0.02).
114
+ - ` --tdna-vaf ` : Tumor DNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).
115
+ - ` --trna-vaf ` : Tumor RNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).
116
+ - ` --expn-val ` : Gene and Transcript expression cutoff. Only sites above this cutoff will be considered (default: 1.0).
117
+
118
+ For pVACfuse, this filter evaluates a fusion variant's fusion read support and fusion transcript expression.
119
+ Arriba natively outputs a number of read metrics. These are the number of supporting split fragments with an anchor in
120
+ gene1 or gene2, respectively, as well as the number of pairs (fragments) of discordant mates supporting the fusion
121
+ (a.k.a. spanning reads or bridge reads). The sum of these three values is
122
+ reported as Read Support in pVACfuse. The fusion transcript expression is
123
+ parsed from the ` --starfusion-file ` , when provided. This is reported as FFPM
124
+ (fusion fragments per million total reads).
125
+
126
+ The following thresholds are applied in pVACfuse by this filter:
127
+
128
+ - ` --read-support ` : Read Support cutoff. Sites above this cutoff will be considered (default: 5).
129
+ - ` --expn-val ` : Expression cutoff. Sites above this cutoff will be considered (default: 0.1).
130
+
131
+ ### Transcript Support Level Filter
132
+
133
+ The Transcript Support Level (TSL) Filter, removes neoantigen candidates for
134
+ transcripts with a high TSL, as defined [ by Ensembl] ( https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl ) .
135
+ The cutoff for this filter is set by the ` --maximum-transcript-support-level `
136
+ parameter. Transcripts with a TSL of NA will always be filtered out.
137
+
138
+ Annotation with TSL values through VEP is only available for GRCh38. For other
139
+ species and older builds, a value of "Not Supported" is written to the report
140
+ and the TSL filter will skip those variants.
141
+
142
+ This filter is currently only run by pVACseq.
143
+
144
+ ### Top Score Filter
145
+
146
+ The Top Score Filter will attempt to determine the best neoantigen candidate
147
+ for each variants.
148
+
149
+ For pVACseq it works as follows. Given a set of neoantigen candidates for a
150
+ variant we first group the transcripts into set where all transcripts in a set
151
+ code for the same set of neoantigen candidates. For each transcript set we then
152
+ determine the best neoantigen candidate as follows:
153
+
154
+ - Pick all neoantigens with a variant transcript that have a protein_coding Biotype
155
+ - Of the remaining candidates, pick the ones with a variant transcript having a
156
+ TSL less then the ` --maximum-transcript-support-level ` .
157
+ - Of the remaining candidates, pick the entries with no Problematic Positions
158
+ - Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in
159
+ more detail further below)
160
+ - Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
161
+ depending on the ` --top-score-metric ` ), lowest TSL, and longest transcript.
162
+
163
+ This filter then reports the best neoantigen candidate for each transcript set.
164
+
165
+ For pVACfuse, the neoantigen candidate for each fusion are similarly grouped
166
+ into sets where all transcript1-transcript2 combinations in a set code for the
167
+ same set of neoantigen candidates. From there, the best neoantigen candidate
168
+ for each transcript set is determined by picking the candidate with the lowest
169
+ MT IC50 Score (Median or Best depending on the ` --top-score-metric ` ) and the
170
+ highest fusion transcript expression.
171
+
172
+ ## Interpreting the aggregated.tsv File
173
+
174
+ The ` aggregated.tsv ` is a condensed output file that shows the best neoantigen
175
+ candidate for each variant and reports only the information most pertinent to
176
+ interpreting the results. It also assigns each of the selected neoantigen candidates
177
+ a tier based on its suitability for vaccine manufacturing.
178
+
179
+ Only epitopes meeting the ` --aggregate-inclusion-threshold ` are included in this report
180
+ (default: 5000). Depending on the value used for the ` --top-score-metric ` , all neoantigen
181
+ candidates with a Median or Best MT IC50 Score below the selected ` --aggregate-inclusion-threshold `
182
+ are included in creating this report.
183
+
184
+ ### Determining the Best Transcript and Best Peptide of a Variant
185
+
186
+ In pVACseq, for each variant, all neoantigen candidates meeting the ` --aggregate-inclusion-threshold ` are evaluated as follows:
187
+
188
+ - Pick all entries with a variant transcript that have a protein_coding Biotype
189
+ - Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= ` --maximum-transcript-support-level `
190
+ - Of the remaining entries, pick the entries with no Problematic Positions
191
+ - Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)
192
+ - Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
193
+ depending on the ` --top-score-metric ` ), lowest Transcript Support Level, and longest transcript.
194
+
195
+ In pVACfuse, the neoantigen candidate with the lowest IC50 binding affinity for each variant is selected.
196
+ The value used for the ` --top-score-metric ` determines whether the lowest or
197
+ median binding affinity is used for this comparison.
198
+
199
+ The chosen entry determines the best neoantigen candidate and the best
200
+ transcript coding for it.
201
+
202
+ ### Tier and Tiering Criteria
203
+
204
+ For the purpose of assigning tiers, each best peptide is evaluated by a set of
205
+ criteria. These criteria and the available tiers differ from tool to tool.
206
+
207
+ #### Tiering in pVACseq
208
+
209
+ The Tiers available in pVACseq are:
210
+
211
+ ``` {r pvacseq_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
212
+ tabl <- "
213
+ | Tier | Criteria |
214
+ |------|----------|
215
+ | Pass | Best Peptide passes the binding, expression, tsl, clonal, and anchor criteria |
216
+ | Anchor | Best Peptide fails the anchor criteria but passes the binding, expression, tsl, and clonal criteria |
217
+ | Subclonal | Best Peptide fails the clonal criteria but passes the binding, tsl, and anchor criteria |
218
+ | LowExpr | Best Peptide meets the Low Expression Criteria and passes the binding, tsl, clonal, and anchor criteria |
219
+ | NoExpr | Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) |
220
+ | Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
221
+ "
222
+ cat(tabl)
223
+ ```
224
+
225
+ ** Criteria Details**
226
+
227
+ ``` {r pvacseq_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
228
+ tabl <- "
229
+ | Criteria | Description | Evaluation |
230
+ |----------|-------------|------------|
231
+ | Binding Criteria | Pass if Best Peptide is a strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. |
232
+ | Expression Criteria | Pass if Best Transcript is expressed | Allele Expr > `--trna-vaf` * `--expn-val` |
233
+ | Low Expression Criteria | Peptide has low expression or no expression but RNA VAF and coverage | (0 < Allele Expr < `--trna-vaf` * `--expn-val`) OR (RNA Expr == 0 AND RNA Depth > `--trna-cov` AND RNA VAF > `--trna-vaf`) |
234
+ | TSL Criteria | Pass if Best Transcript has good transcript support level | TSL <= `--maximum-transcript-support-level` |
235
+ | Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > `--tumor-purity` / 4 |
236
+ | Anchor Criteria | Fail if all mutated amino acids of the Best Peptide (Pos) are at an anchor position and the WT peptide has good binding (IC50 WT < `--binding-threshold`). `--allele-specific-binding-thresholds` flag is respected. |
237
+ "
238
+ cat(tabl)
239
+ ```
240
+
241
+ #### Tiering in pVACfuse
242
+
243
+ The Tiers available in pVACfuse are:
244
+
245
+ ``` {r pvacfuse_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
246
+ tabl <- "
247
+ | Tier | Criteria |
248
+ |------|---------|
249
+ | Pass | Best Peptide passes the binding, read support, and expression criteria |
250
+ | LowReadSupport | Best Peptide fails the read support criteria but passes the binding and expression criteria |
251
+ | LowExpr | Best Peptide fails the expression criteria but passes the binding and read support criteria |
252
+ | Poor | Best Peptide doesn’t fit any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
253
+ "
254
+ cat(tabl)
255
+ ```
256
+
257
+ ** Criteria Details**
258
+
259
+ ``` {r pvacfuse_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
260
+ tabl <- "
261
+ | Criteria | Description | Evaluation |
262
+ |----------|-------------|------------|
263
+ | Binding Criteria | Pass if Best Peptide is strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. |
264
+ | Read Support Criteria | Pass if the variant has read support | Read Support < `--read-support` |
265
+ | Expression Criteria | Pass if Best Transcript is expressed | Expr < `--expn-val` |
266
+ "
267
+ cat(tabl)
268
+ ```
0 commit comments