Skip to content

Commit e0deed3

Browse files
Merge pull request #34 from griffithlab/updates
Start filling in the pVACview chapter
2 parents df98f80 + 1dd47a5 commit e0deed3

File tree

7 files changed

+317
-32
lines changed

7 files changed

+317
-32
lines changed

01-intro.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This course has been developed recently (Summer 2023). We welcome any feedback a
1010
## Motivation
1111

1212
Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines.
13-
This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational
13+
This is a cross-disciplinary challenge, which involves genomics, proteomics, immunology, and computational approaches. We have built a computational
1414
framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization.
1515
pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions,
1616
and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework

02-prerequisites.Rmd

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -71,36 +71,36 @@ For this course, we have put together a set of input data generated from the bre
7171
cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
7272
Data from this cell line is commonly used as test data in bioinformatics applications.
7373
For more information on these lines and the generation of test data, please refer to
74-
the data section of our precision medicine bioinformatics course:
75-
[here](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
74+
the [data section of our precision medicine bioinformatics course](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
7675

7776
The input data consists of the following files:
7877

7978
For pVACseq:
8079

8180
- `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
8281
annotated with VEP and has coverage and expression information added. It has also been annotated with
83-
custom VEP plugins that provide wild type and mutant version of the full length protein sequences
82+
custom VEP plugins that provide wild type and mutant versions of the full length protein sequences
8483
predicted to arise from each transcript annotated with each variant.
8584
- `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
8685
in-phase proximal variants that might alter the predicted peptide sequence around a somatic
87-
mutation of interest
88-
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions
86+
mutation of interest.
87+
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions.
8988

9089
For more detailed information on how the variant input file is created, please refer to the
9190
[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html)
92-
section of the pVACtools docs
91+
section of the pVACtools docs.
9392

9493
For pVACfuse:
9594

96-
- `agfusion_results`: A AGFusion output directory with annotated fusion calls
95+
- `agfusion_results`: An AGFusion output directory with annotated fusion
96+
calls.
9797
- `star-fusion.fusion_predictions.tsv`: A STARFusion prediction file with fusion read support
98-
and expression information
98+
and expression information.
9999

100100
General:
101101

102102
- `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use
103-
for determining whether there are any reference matches of neoantigen candidates
103+
for determining whether there are any reference matches of neoantigen candidates.
104104

105105
To download this data, please run the following commands:
106106

04-outputs.Rmd

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -95,9 +95,9 @@ patient's RNA.
9595

9696
For pVACseq, this generally relies on your VCF being annotated with coverage
9797
and expression data. In our example, the VCF has already been annotated with
98-
this data. For more information about how to add coverage and expression data
99-
to your own VCFs, please see [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
100-
and [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html).
98+
this data. For more information about how to add [coverage](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
99+
and [expression data](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html)
100+
to your own VCFs, please see our docs.
101101
Additionally, filtering on the normal DNA depth and variant allele frequency
102102
(VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
103103
to be identifies in your pVACseq run using the `--normal-sample-name`
@@ -130,7 +130,7 @@ The following thresholds are applied in pVACfuse by this filter:
130130

131131
### Transcript Support Level Filter
132132

133-
The Transcript Support Level (TSL) Filter, removes neoantigen candidates for
133+
The Transcript Support Level (TSL) Filter removes neoantigen candidates for
134134
transcripts with a high TSL, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl).
135135
The cutoff for this filter is set by the `--maximum-transcript-support-level`
136136
parameter. Transcripts with a TSL of NA will always be filtered out.
@@ -147,16 +147,16 @@ The Top Score Filter will attempt to determine the best neoantigen candidate
147147
for each variants.
148148

149149
For pVACseq it works as follows. Given a set of neoantigen candidates for a
150-
variant we first group the transcripts into set where all transcripts in a set
150+
variant we first group the transcripts into sets where all transcripts in a set
151151
code for the same set of neoantigen candidates. For each transcript set we then
152152
determine the best neoantigen candidate as follows:
153153

154154
- Pick all neoantigens with a variant transcript that have a protein_coding Biotype
155155
- Of the remaining candidates, pick the ones with a variant transcript having a
156156
TSL less then the `--maximum-transcript-support-level`.
157-
- Of the remaining candidates, pick the entries with no Problematic Positions
157+
- Of the remaining candidates, pick the entries with no Problematic Positions.
158158
- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in
159-
more detail further below)
159+
more detail further below).
160160
- Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
161161
depending on the `--top-score-metric`), lowest TSL, and longest transcript.
162162

@@ -185,10 +185,10 @@ are included in creating this report.
185185

186186
In pVACseq, for each variant, all neoantigen candidates meeting the `--aggregate-inclusion-threshold` are evaluated as follows:
187187

188-
- Pick all entries with a variant transcript that have a protein_coding Biotype
189-
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`
190-
- Of the remaining entries, pick the entries with no Problematic Positions
191-
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)
188+
- Pick all entries with a variant transcript that have a protein_coding Biotype.
189+
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`.
190+
- Of the remaining entries, pick the entries with no Problematic Positions.
191+
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below).
192192
- Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
193193
depending on the `--top-score-metric`), lowest Transcript Support Level, and longest transcript.
194194

0 commit comments

Comments
 (0)