Skip to content

Commit dadfc6e

Browse files
Fill in outputs chapter
1 parent 3f8f911 commit dadfc6e

File tree

2 files changed

+267
-0
lines changed

2 files changed

+267
-0
lines changed

04-outputs.Rmd

Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,258 @@ This chapter will cover:
1111
- Understanding the output files produced by pVACtools
1212
- Interpreting the .filtered.tsv file
1313
- Interpreting the .aggregated.tsv file
14+
15+
## pVACtools Output Files
16+
17+
Both pVACseq and pVACfuse produce three main output files:
18+
19+
- The `all_epitopes.tsv` file is a TSV file with all predicted neoantigens and
20+
all information obtained during the run.
21+
- The `filtered.tsv` file is the same structure as the all_epitopes.tsv file
22+
but the entries have been filtered down according to the thresholds set by
23+
the user during the run. The filters will be further explained in
24+
subsequent sections.
25+
- The `aggregated.tsv` is a condensed output file that contains only the
26+
information most pertinent to interpret the results. It has contains only
27+
the best neoantigen candidate for each variant. Our heuristic for
28+
determining the best neoantigen is described in subsequent sections of this
29+
course.
30+
31+
There are also a number of a secondary output files produced by pVACseq and
32+
pVACfuse. The most important are:
33+
34+
- `aggregated.metrics.json`: The file is only produced by pVACseq. It contains
35+
metadata needed for visualizing your results in pVACview.
36+
- `aggregated.tsv.reference_matches`: This file is created when the
37+
reference proteome match feature is enabled during a run. It contains
38+
detailed information about the reference matches found, if there are any.
39+
40+
## Interpreting the filtered.tsv File
41+
42+
The filtered.tsv file takes all the predicted neoantigens from the
43+
all_epitopes.tsv file and applies a number of filters to it. Filters are
44+
applied consecutively, meaning that only the entries passing the first filter
45+
will be passed along to the second filter, and so on. Only neoantigens
46+
passing all filters will be reported in this file.
47+
48+
### Binding Filter
49+
50+
The binding filter's primary function is to filter neoantigen candidates on
51+
their IC50 binding affinity to an HLA allele. Because pVACtools allows users
52+
to run more than one prediction algorithm, we then apply two summarization
53+
methods on the calls for each neoantigen candidate and HLA allele combination:
54+
(1) pVACtools calculates the median IC50 binding affinity for all selected prediction
55+
algorithms (reported in the `Median [MT] IC50 Score` column), and (2) pVACtools selects
56+
the IC50 binding affinity prediction with the lowest value (reported in the
57+
`Best [MT] IC50 Score)` column. By default,
58+
the binding filter is applied to the median IC50 score unless
59+
users set the `--top-score-metric` parameters to `lowest`.
60+
61+
The binding filter discards candidates where the binding affinity is above the
62+
`--binding-threshold` (default: 500). However, users may set the
63+
`--allele-specific-binding-thresholds` flag in order to use differing binding
64+
thresholds depending on the HLA allele of the prediction, as recommended by
65+
[IEDB](https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions).
66+
Custom thresholds are available for the most common 76 class I HLA alleles.
67+
For all others, the `--binding-threshold` value is used.
68+
69+
In addition to the binding affinity, other optional parameters can be set to
70+
enabled additional filtering on related metrics:
71+
72+
- `--minimum-fold-change`: The fold change is the ratio of the mutant binding affinity to
73+
the wild-type binding affinity, also called agretopicity. A fold change of 1
74+
means that the mutant is a better binder than the wild type. pVACtools
75+
calculates this ratio for both the median as well as the lowest values.
76+
Which one is filtered on for this metric depends again on the
77+
`--top-score-metric` set. When a minimum fold change parameter is set, the binding filter
78+
discards any prediction with a agretopicity below the set cutoff. This
79+
parameter is not available in pVACfuse because there is no matched wildtype
80+
peptide for each neoantigen candidate.
81+
- `--percentile-threshold`: The prediction algorithms supported by pVACtools
82+
also report a percentile score that represents where each neoantigen's predicted
83+
affinity falls in the range of other values for an HLA allele. Similar to
84+
the binding affinity itself, pVACtools report the median and the lowest
85+
percentile scores for the range of scores reported by the prediction
86+
algorithms chosen by the user and which on is used for filtering is again
87+
controlled by the `--top-score-metric` parameter.
88+
89+
### Coverage Filter
90+
91+
The Coverage Filter is generally used to filter out variants that don't have
92+
enough read support or expression. This ensures that the remaining variants
93+
are not just artifacts and that the genes are actually expressed in the
94+
patient's RNA.
95+
96+
For pVACseq, this generally relies on your VCF being annotated with coverage
97+
and expression data. In our example, the VCF has already been annotated with
98+
this data. For more information about how to add coverage and expression data
99+
to your own VCFs, please see [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
100+
and [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html).
101+
Additionally, filtering on the normal DNA depth and variant allele frequency
102+
(VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
103+
to be identifies in your pVACseq run using the `--normal-sample-name`
104+
parameter. If a coverage metric doesn't apply because the underlying data is
105+
not available, `NA` is reported by pVACtools. By default, the filter will skip
106+
evaluating a coverage criteria when a neoantigen's value for it is `NA`.
107+
108+
The following thresholds are applied in pVACseq by this filter:
109+
110+
- `--normal-cov`: Normal coverage cutoff. Minimum number of required reads in the normal DNA (default: 5).
111+
- `--tdna-cov`: Tumor DNA coverage cutoff. Minimum number of required reads in the tumor DNA (default: 10).
112+
- `--trna-cov`: Tumor RNA coverage cutoff. Minimum number of required reads in the tumor RNA (default: 10).
113+
- `--normal-vaf`: Normal VAF cutoff. Only sites BELOW this cutoff in the normal DNA will be considered (default: 0.02).
114+
- `--tdna-vaf`: Tumor DNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).
115+
- `--trna-vaf`: Tumor RNA VAF cutoff. Only sites above this cutoff will be considered (default: 0.25).
116+
- `--expn-val`: Gene and Transcript expression cutoff. Only sites above this cutoff will be considered (default: 1.0).
117+
118+
For pVACfuse, this filter evaluates a fusion variant's fusion read support and fusion transcript expression.
119+
Arriba natively outputs a number of read metrics. These are the number of supporting split fragments with an anchor in
120+
gene1 or gene2, respectively, as well as the number of pairs (fragments) of discordant mates supporting the fusion
121+
(a.k.a. spanning reads or bridge reads). The sum of these three values is
122+
reported as Read Support in pVACfuse. The fusion transcript expression is
123+
parsed from the `--starfusion-file`, when provided. This is reported as FFPM
124+
(fusion fragments per million total reads).
125+
126+
The following thresholds are applied in pVACfuse by this filter:
127+
128+
- `--read-support`: Read Support cutoff. Sites above this cutoff will be considered (default: 5).
129+
- `--expn-val`: Expression cutoff. Sites above this cutoff will be considered (default: 0.1).
130+
131+
### Transcript Support Level Filter
132+
133+
The Transcript Support Level (TSL) Filter, removes neoantigen candidates for
134+
transcripts with a high TSL, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl).
135+
The cutoff for this filter is set by the `--maximum-transcript-support-level`
136+
parameter. Transcripts with a TSL of NA will always be filtered out.
137+
138+
Annotation with TSL values through VEP is only available for GRCh38. For other
139+
species and older builds, a value of "Not Supported" is written to the report
140+
and the TSL filter will skip those variants.
141+
142+
This filter is currently only run by pVACseq.
143+
144+
### Top Score Filter
145+
146+
The Top Score Filter will attempt to determine the best neoantigen candidate
147+
for each variants.
148+
149+
For pVACseq it works as follows. Given a set of neoantigen candidates for a
150+
variant we first group the transcripts into set where all transcripts in a set
151+
code for the same set of neoantigen candidates. For each transcript set we then
152+
determine the best neoantigen candidate as follows:
153+
154+
- Pick all neoantigens with a variant transcript that have a protein_coding Biotype
155+
- Of the remaining candidates, pick the ones with a variant transcript having a
156+
TSL less then the `--maximum-transcript-support-level`.
157+
- Of the remaining candidates, pick the entries with no Problematic Positions
158+
- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in
159+
more detail further below)
160+
- Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
161+
depending on the `--top-score-metric`), lowest TSL, and longest transcript.
162+
163+
This filter then reports the best neoantigen candidate for each transcript set.
164+
165+
For pVACfuse, the neoantigen candidate for each fusion are similarly grouped
166+
into sets where all transcript1-transcript2 combinations in a set code for the
167+
same set of neoantigen candidates. From there, the best neoantigen candidate
168+
for each transcript set is determined by picking the candidate with the lowest
169+
MT IC50 Score (Median or Best depending on the `--top-score-metric`) and the
170+
highest fusion transcript expression.
171+
172+
## Interpreting the aggregated.tsv File
173+
174+
The `aggregated.tsv` is a condensed output file that shows the best neoantigen
175+
candidate for each variant and reports only the information most pertinent to
176+
interpreting the results. It also assigns each of the selected neoantigen candidates
177+
a tier based on its suitability for vaccine manufacturing.
178+
179+
Only epitopes meeting the `--aggregate-inclusion-threshold` are included in this report
180+
(default: 5000). Depending on the value used for the `--top-score-metric`, all neoantigen
181+
candidates with a Median or Best MT IC50 Score below the selected `--aggregate-inclusion-threshold`
182+
are included in creating this report.
183+
184+
### Determining the Best Transcript and Best Peptide of a Variant
185+
186+
In pVACseq, for each variant, all neoantigen candidates meeting the `--aggregate-inclusion-threshold` are evaluated as follows:
187+
188+
- Pick all entries with a variant transcript that have a protein_coding Biotype
189+
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`
190+
- Of the remaining entries, pick the entries with no Problematic Positions
191+
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)
192+
- Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
193+
depending on the `--top-score-metric`), lowest Transcript Support Level, and longest transcript.
194+
195+
In pVACfuse, the neoantigen candidate with the lowest IC50 binding affinity for each variant is selected.
196+
The value used for the `--top-score-metric` determines whether the lowest or
197+
median binding affinity is used for this comparison.
198+
199+
The chosen entry determines the best neoantigen candidate and the best
200+
transcript coding for it.
201+
202+
### Tier and Tiering Criteria
203+
204+
For the purpose of assigning tiers, each best peptide is evaluated by a set of
205+
criteria. These criteria and the available tiers differ from tool to tool.
206+
207+
#### Tiering in pVACseq
208+
209+
The Tiers available in pVACseq are:
210+
211+
```{r pvacseq_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
212+
tabl <- "
213+
| Tier | Criteria |
214+
|------|----------|
215+
| Pass | Best Peptide passes the binding, expression, tsl, clonal, and anchor criteria |
216+
| Anchor | Best Peptide fails the anchor criteria but passes the binding, expression, tsl, and clonal criteria |
217+
| Subclonal | Best Peptide fails the clonal criteria but passes the binding, tsl, and anchor criteria |
218+
| LowExpr | Best Peptide meets the Low Expression Criteria and passes the binding, tsl, clonal, and anchor criteria |
219+
| NoExpr | Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) |
220+
| Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
221+
"
222+
cat(tabl)
223+
```
224+
225+
**Criteria Details**
226+
227+
```{r pvacseq_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
228+
tabl <- "
229+
| Criteria | Description | Evaluation |
230+
|----------|-------------|------------|
231+
| Binding Criteria | Pass if Best Peptide is a strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. |
232+
| Expression Criteria | Pass if Best Transcript is expressed | Allele Expr > `--trna-vaf` * `--expn-val` |
233+
| Low Expression Criteria | Peptide has low expression or no expression but RNA VAF and coverage | (0 < Allele Expr < `--trna-vaf` * `--expn-val`) OR (RNA Expr == 0 AND RNA Depth > `--trna-cov` AND RNA VAF > `--trna-vaf`) |
234+
| TSL Criteria | Pass if Best Transcript has good transcript support level | TSL <= `--maximum-transcript-support-level` |
235+
| Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > `--tumor-purity` / 4 |
236+
| Anchor Criteria | Fail if all mutated amino acids of the Best Peptide (Pos) are at an anchor position and the WT peptide has good binding (IC50 WT < `--binding-threshold`). `--allele-specific-binding-thresholds` flag is respected. |
237+
"
238+
cat(tabl)
239+
```
240+
241+
#### Tiering in pVACfuse
242+
243+
The Tiers available in pVACfuse are:
244+
245+
```{r pvacfuse_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
246+
tabl <- "
247+
| Tier | Criteria |
248+
|------|---------|
249+
| Pass | Best Peptide passes the binding, read support, and expression criteria |
250+
| LowReadSupport | Best Peptide fails the read support criteria but passes the binding and expression criteria |
251+
| LowExpr | Best Peptide fails the expression criteria but passes the binding and read support criteria |
252+
| Poor | Best Peptide doesn’t fit any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria |
253+
"
254+
cat(tabl)
255+
```
256+
257+
**Criteria Details**
258+
259+
```{r pvacfuse_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
260+
tabl <- "
261+
| Criteria | Description | Evaluation |
262+
|----------|-------------|------------|
263+
| Binding Criteria | Pass if Best Peptide is strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. |
264+
| Read Support Criteria | Pass if the variant has read support | Read Support < `--read-support` |
265+
| Expression Criteria | Pass if Best Transcript is expressed | Expr < `--expn-val` |
266+
"
267+
cat(tabl)
268+
```

resources/dictionary.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
AGFusion
22
Arriba
33
AnVIL
4+
agretopicity
45
BIPOC
6+
Biotype
57
BLASTp
68
Bloomberg
79
Bookdown
@@ -27,6 +29,7 @@ epitope
2729
epitopes
2830
Ensembl
2931
FASTA
32+
FFPM
3033
favicon
3134
frameshift
3235
fyi
@@ -35,12 +38,14 @@ GenBank
3538
GH
3639
GitHub
3740
Github
41+
GRCh
3842
germline
3943
gnomAD
4044
griffithlab
4145
HCC
4246
HLA
4347
histocompatibility
48+
homozygous
4449
https
4550
http
4651
IC
@@ -59,11 +64,13 @@ itcrtraining
5964
json
6065
junctional
6166
Leanpub
67+
lymphoblastoid
6268
MHCnuggets
6369
Markua
6470
mRNA
6571
manufacturability
6672
mentorship
73+
mer
6774
mers
6875
missense
6976
MHC
@@ -78,6 +85,7 @@ nd
7885
neoantigen
7986
Neoantigen
8087
neoantigens
88+
neoantigen's
8189
nmol
8290
OptiType
8391
ottrpal
@@ -99,15 +107,19 @@ RefSeq
99107
reproducibility
100108
somatically
101109
subclonal
110+
summarization
102111
STARFusion
103112
tbi
104113
tiering
114+
TSL
105115
tsv
106116
UE
107117
UE5
108118
underserved
119+
VAF
109120
VCF
110121
vaxrank
111122
VEP
112123
www
113124
Wildtype
125+
wildtype

0 commit comments

Comments
 (0)