Skip to content

Commit 2263144

Browse files
Revise handling of COSMIC mutational signatures (#17)
1 parent 847dee1 commit 2263144

37 files changed

+129
-327
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM vanallenlab/almanac:base
1+
FROM vanallenlab/miniconda:3.11
22

33
WORKDIR /
44

README.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ Molecular Oncology Almanac is a clinical interpretation algorithm for cancer gen
99
- Identify overlap between somatic variants observed from both DNA and RNA, or any other source of validation sequencing.
1010
- Identify somatic and germline variants that may be related to microsatellite stability.
1111
- Calculate coding mutational burden and compare your patient to TCGA.
12-
- Calculate contribution of known [COSMIC mutational signatures](https://cancer.sanger.ac.uk/signatures/signatures_v2/) with [deconstructsigs](https://github.com/raerose01/deconstructSigs).
1312
- Identify genomic features that may be related to one another.
1413
- Create portable web-based actionability reports, summarizing clinically relevant findings.
1514

@@ -19,7 +18,7 @@ You can view additional documentation, including [descriptions of inputs](docs/d
1918
The codebase is available for download through this GitHub repository, [Dockerhub](https://hub.docker.com/r/vanallenlab/moalmanac/), and [Terra](https://portal.firecloud.org/#methods/vanallenlab/moalmanac/2). The method can also be run on Terra, without having to use Terra, by using [our portal](https://portal.moalmanac.org/). **Accessing Molecular Oncology Almanac through GitHub will require building some of the [datasources](moalmanac/datasources/) but they are also contained in the Docker container**.
2019

2120
### Installation
22-
Molecular Oncology Almanac is a Python application using Python 3.11 but also utilizes R to run [deconstructSigs](https://github.com/raerose01/deconstructSigs) as a subprocess. This application, datasources, and all dependencies are packaged on Docker and can be downloaded with the command
21+
Molecular Oncology Almanac is a Python application using Python 3.11. This application, datasources, and all dependencies are packaged on Docker and can be downloaded with the command
2322
```bash
2423
docker pull vanallenlab/moalmanac
2524
```
@@ -36,14 +35,6 @@ source activate moalmanac
3635
pip install -r requirements.txt
3736
```
3837

39-
You can install [deconstructSigs](https://github.com/raerose01/deconstructSigs) after [installing R](https://www.r-project.org/) with the following commands
40-
```bash
41-
Rscript -e 'install.packages("RCurl", repos = "http://cran.rstudio.com/")' \
42-
&& Rscript -e 'source("http://bioconductor.org/biocLite.R"); biocLite("BSgenome"); biocLite("BSgenome.Hsapiens.UCSC.hg19"); biocLite("GenomeInfoDb")' \
43-
&& Rscript -e 'install.packages("reshape2", repos = "http://cran.rstudio.com/")' \
44-
&& Rscript -e 'install.packages("deconstructSigs", repos = "http://cran.rstudio.com/")'
45-
```
46-
4738
## Usage
4839
Usage documentation can be found within the [moalmanac/](moalmanac) directory of this repository.
4940

base-image/Dockerfile

Lines changed: 0 additions & 23 deletions
This file was deleted.

base-image/README.md

Lines changed: 0 additions & 10 deletions
This file was deleted.

docs/description-of-inputs.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Example inputs can be found in the [`example_data/`](/example_data/) folder, fou
1616
- [Germline variants](#germline-variants)
1717
- [Somatic variants from validation sequencing](#somatic-variants-from-validation-sequencing)
1818
- [Microsatellite status](#microsatellite-status)
19+
- [Mutational signatures](#mutational-signatures)
1920
- [Purity](#purity)
2021
- [Ploidy](#ploidy)
2122
- [Whole genome doubling](#whole-genome-doubling)
@@ -124,7 +125,7 @@ This input is looking for an integer value.
124125

125126
The rows associated with _TP53_, _CDKN2A_, and _EGFR_ will be interpreted and scored by Molecular Oncology Almanac while _BRAF_ will be filtered.
126127

127-
### Required files
128+
### Required fields
128129
Required fields can be changed from their default expectations by editing the appropriate section of [colnames.ini](https://github.com/vanallenlab/moalmanac/blob/main/moalmanac/colnames.ini). Column names are **not** case-sensitive.
129130
- `gene`, gene symbol associated with the copy number alteration
130131
- `call`, copy number event of the gene. `Amplification` and `Deletion` are accepted and all other values will be filtered.
@@ -238,6 +239,23 @@ At least one of the following also must be included:
238239

239240
Microsatellite status is reported in the clinical actionability report.
240241

242+
## Mutational signatures
243+
`--mutational_signatures` anticipates a tab delimited file which contains contributions to Single Base Substitution (SBS) Mutational Signatures from [COSMIC version 3.4](https://cancer.sanger.ac.uk/signatures/sbs/). The file should only contain signature contributions for the tumor sample being studied. We recommend generating SBS mutational signatures with [SigProfilerAssignment](https://github.com/AlexandrovLab/SigProfilerAssignment), and have prepared [a wrapper GitHub repository](https://github.com/vanallenlab/SigProfilerAssignment-wrapper) to run SigProfilerAssignment and format signature contributions as expected.
244+
245+
### Example
246+
| signature | contribution |
247+
|---|--------------|
248+
| SBS1 | 0.03846154 |
249+
| SBS2 | 0 |
250+
| SBS3 | 0.8525641 |
251+
| ... | ... |
252+
| SBS95 | 0 |
253+
254+
### Required fields,
255+
The required fields for this file can be changed from their default expectations by editing the appropriate section of `colnames.ini`. Column names are **not** case sensitive.
256+
- `signature`, labels for each of the 79 SBS mutational signatures included in COSMIC mutational signatures [version 3.4](https://cancer.sanger.ac.uk/signatures/sbs/)
257+
- `contribution`, a float value between 0 and 1 for the row's associated signature weight. This column's values should sum to 1.
258+
241259
## Purity
242260
`--purity` anticipates a float value between 0.0 and 1.0 for the reported tumor purity. This is just used for reporting in the clinical actionability report.
243261

docs/description-of-outputs.md

Lines changed: 3 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,6 @@ All outputs will be produced by Molecular Oncology Almanac, though some may not
2828
* [Integrated summary](#integrated-summary)
2929
* [Microsatellite Instability variants](#microsatellite-instability-variants)
3030
* [Mutational burden](#mutational-burden)
31-
* [Mutational signatures](#mutational-signatures)
32-
* [Trinucleotide context counts](#trinucleotide-context-counts)
33-
* [COSMIC signature (v2) weights](#cosmic-signature-v2-weights)
34-
* [Trinucleotide context counts image](#trinucleotide-context-counts-image)
35-
* [Trinucleotide context normalized counts image](#trinucleotide-context-normalized-counts-image)
3631
* [Preclinical efficacy](#preclinical-efficacy)
3732
* [Profile-to-cell line matchmaking](#profile-to-cell-line-matchmaking)
3833
* [Report](#report)
@@ -63,7 +58,7 @@ Molecular Oncology Almanac standardizes primary descriptors for molecular featur
6358
* Rearrangements: gene name, Molecular Oncology Almanac will process each partner in the fusion separately
6459
* Microsatellite stability: microsatellite stability status (MSI-High or MSI-Low)
6560
* Mutational burden: High Mutational Burden, if the mutational burden is deemed to be high
66-
* Mutational signatures: the specific COSMIC (v2) mutational signature, formatted as "COSMIC Signature (number)"
61+
* Mutational signatures: the specific COSMIC (v3.4) mutational signature, formatted as "COSMIC Signature (number)"
6762
* Aneuploidy: Whole-genome doubling, this will only be populated if the `--wgd` value is passed to Molecular Oncology Almanac.
6863
* `alteration_type` is a descriptor to provide more granular detail on the molecular event.
6964
* Somatic variants: variant classification of the variant (Missense, Nonsense, etc.)
@@ -319,31 +314,8 @@ Molecular Oncology Almanac designates high mutational burden under two circumsta
319314
- Mutations per Mb > 10
320315
- At least a mutational burden of 80th percentile of TCGA tumor type, if matched, or TCGA generally, if not matched.
321316

322-
## Mutational signatures
323-
Molecular Oncology Almanac runs [deconstructSigs](https://github.com/raerose01/deconstructSigs) as a subprocess based on the MAF file passed with the input argument `--snv_handle`, performing NMF against the 30 COSMIC v2 signatures.
324-
325-
### Trinucleotide context counts
326-
Filename suffix: `.sigs.context.txt`
327-
328-
Trinucleotide context counts of observed somatic variants for all 96 bins are listed in this tab delimited file.
329-
330-
### COSMIC signature (v2) weights
331-
Filename suffix: `.sigs.cosmic.txt`
332-
333-
Weights for the 30 COSMIC (v2) mutational signatures are listed in this tab delimited file. Thresholds for a signature to be considered present or not present by Molecular Oncology Almanac are specified in [config.ini](/moalmanac/config.ini) under the `[signatures]` heading.
334-
335-
### Trinucleotide context counts image
336-
Filename suffix: `.sigs.tricontext.counts.png`
337-
338-
Trinucleotide context raw counts of observed somatic variants for all 96 bins are visualized in this png file.
339-
340-
### Trinucleotide context normalized counts image
341-
Filename suffix: `.sigs.tricontext.normalized.png`
342-
343-
Trinucleotide context normalized counts of observed somatic variants for all 96 bins are visualized in this png file.
344-
345317
## Preclinical efficacy
346-
Filename suffix: `.preclinical.efficacy.txt`
318+
Filename suffix: `.preclinical_efficacy.txt`
347319

348320
Therapies listed in [actionable](#actionable) that have been evaluated on cancer cell lines through the Sanger Institute's GDSC are evaluated for efficacy in the presence and absence of the associated molecular feature. This is performed for relationships associated with therapeutic sensitivity. Columns include:
349321
- `patient_id` (str) - the string associated with the given molecular profile (`--patient_id`)
@@ -396,7 +368,7 @@ Additional equivalent within a provided ontology or stronger matches from anothe
396368

397369
For molecular features associated with therapeutic sensitivity that have a therapy evaluated on cancer cell lines, a button `[Preclinical evidence]` will appear below the therapy and rationale which will open a modal to compare the sensitivity to the therapy of interest between mutant and wild type cell lines.
398370

399-
Molecular features which are biologically relevant are listed without clinical association. Molecular features will appear here if the associated gene is catalogued in the Molecular Oncology Almanac but under a different feature type, variants are associated with microsatellite stability, and all present COSMIC version 2 mutational signatures not associated with a clinical assertion are reported.
371+
Molecular features which are biologically relevant are listed without clinical association. Molecular features will appear here if the associated gene is catalogued in the Molecular Oncology Almanac but under a different feature type, variants are associated with microsatellite stability, and all present COSMIC v3.4 mutational signatures not associated with a clinical assertion are reported.
400372

401373
The last section of the report, comparison of molecular profile to cancer cell lines, displays results from Molecular Oncology Almanac's patient-to-cell line matchmaking module. **This will not appear in the report if `--disable_matchmaking` is passed as an argument**. The 5 most similar cancer cell lines to the provided profile are listed each listing the cell line name, sensitive therapies from GDSC, and clinically relevant features present. Users can click `[More details]` under each cell line's name for more details about a given cell line: aliases, sensitive therapies, clinically relevant molecular features, all somatic variants, copy number alterations, and fusions occuring in cancer gene census genes, and the 10 most sensitive therapies to the cancer cell line.
402374

0 commit comments

Comments
 (0)