Skip to content

Commit d65b867

Browse files
Adding logging (#22)
1 parent f8a911d commit d65b867

File tree

8 files changed

+916
-103
lines changed

8 files changed

+916
-103
lines changed

docs/description-of-inputs.md

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,11 @@
33
Example inputs can be found in the [`example_data/`](/example_data/) folder, found in the root directory of this repository.
44

55
# Table of Contents
6+
The following describes required and optional arguments to run `moalmanac/moalmanac.py`.
67
* [Required arguments](#required-arguments)
78
- [Patient id](#patient-id)
9+
- [Config](#config)
10+
- [Databases](#databases)
811
* [Optional arguments](#optional-arguments)
912
- [Tumor type](#tumor-type)
1013
- [Stage](#stage)
@@ -23,14 +26,53 @@ Example inputs can be found in the [`example_data/`](/example_data/) folder, fou
2326
- [Disable matchmaking](#disable-matchmaking)
2427
- [Description](#description)
2528
- [Output directory](#output-directory)
26-
- [Simplified input](#simplified-input)
29+
- [Preclinical databases](#preclinical-databases)
30+
31+
Alternatively, a simplified version of the interpretation algorithm can be run by using `moalmanac/simplified_input.py`.
32+
- [Simplified input arguments](#simplified-input)
2733

2834
# Required arguments
2935
The following arguments are required to run Molecular Oncology Almanac.
3036

3137
## Patient id
3238
`--patient_id` expects a single string value which is used for labeling outputs.
3339

40+
## Config
41+
`--config` expects a file path to the [config.ini](https://github.com/vanallenlab/moalmanac/blob/main/moalmanac/config.ini) file.
42+
43+
This config file contains the following sections,
44+
- `function_toggle` - allows several features of the MOAlmanac algorithm to be enabled or disabled
45+
- `logging` - specifies the [level](https://docs.python.org/3/library/logging.html#levels) that the logger should be configured to use
46+
- `versions` - specifies the versions of the [MOAlmanac algorithm (interpreter)](https://github.com/vanallenlab/moalmanac/releases) and [database](https://github.com/vanallenlab/moalmanac-db/releases).
47+
- `exac` - specifies the allele frequency threshold used with [ExAC](https://github.com/vanallenlab/moalmanac/tree/main/datasources/exac) to specify if a variant is a common variant or not
48+
- `fusion` - specifies minimum spanning fragments required for review by MOAlmanac, column names expected from inputs, and how "Fusion" should be written from input
49+
- `mutations` - specifies the minimum coverage and allelic fraction that a variant needs for review by MOAlmanac
50+
- `seg` - specifies the percentile to evaluate copy gain and loss variants from segmented copy number input files, as well as how amplification and deletion should be written as strings
51+
- `signatures` - specifies the minimum contribution required to review COSMIC mutational signatures by mMOAlmanac
52+
- `validation_sequencing` - Thresholds for minimum power to detect variants and minimum allelic fraction for annotation from validation sequencing. This is further described in the [Methods section](https://www.nature.com/articles/s43018-021-00243-3#Sec8) of our paper.
53+
- `feature_types` - String labels for each biomarker type passed to the algorithm. These values will be included in `feature_type` column of outputs.
54+
55+
## Databases
56+
`--dbs` expects a file path to the [annotation-databases.ini](../moalmanac/annotation-databases.ini) file.
57+
58+
This config file contains a single section `databases` that lists the following:
59+
- `root` - path to `datasources/` directory
60+
- `almanac_handle` - path within `root` that points to the `molecular-oncology-almanac.json` datasource file
61+
- `cancerhotspots_handle` - path within `root` that points to the Cancer Hotspots datasource file
62+
- `3dcancerhotspots_handle` - path within `root` that points to the Cancer Hotspots 3D datasource file
63+
- `cgc_handle` - path within `root` that points to the Cancer Gene Census file
64+
- `cosmic_handle` - path within `root` that points to the COSMIC datasource file
65+
- `gsea_pathways_handle` - path within `root` that points to the GSEA pathways datasource file
66+
- `gsea_modules_handle` - path within `root` that points to the GSEA modules datasource file
67+
- `exac_handle` - path within `root` that points to the ExAC datasource file
68+
- `acmg_handle` - path within `root` that points to the ACMG datasource file
69+
- `clinvar_handle` - path within `root` that points to the ClinVar datasource file
70+
- `hereditary_handle` - path within `root` that points to the genes related to hereditary cancers datasource file
71+
- `oncotree_handle` - path within `root` that points to the Oncotree datasource file
72+
- `lawrence_handle` - path within `root` that points to the Lawrence et al. TCGA mutational burden datasource file
73+
74+
For more information about each datasource, view the [datasources directory](../datasources/README.md)
75+
3476
# Optional arguments
3577
Molecular Oncology Almanac will run successfully given any combination of the following arguments:
3678

@@ -274,7 +316,26 @@ The required fields for this file can be changed from their default expectations
274316
## Output directory
275317
`--output-directory` allows users to specify an output directory to write outputs to, the current working directory will be used if unspecified.
276318

277-
## Simplified input
319+
## Preclinical databases
320+
`--preclinical-dbs` expects a file path to the [preclinical-databases.ini](../moalmanac/preclinical-databases.ini) file. This argument and ini file are required to run either module that either:
321+
- Looks at the efficacy of relationships in cancer cell lines
322+
- Performs genomic similarity to cancer cell lines
323+
324+
This config file contains a single section `preclinical` that lists the following:
325+
- `root` - path to `datasources/preclinical/` directory
326+
- `almanac_gdsc_mappings` - path within `root` that points to the `formatted/almanac-gdsc-mappings.json` datasource file
327+
- `summary` - path within `root` that points to the `formatted/cell-lines.summary.txt` datasource file
328+
- `variants` - path within `root` that points to the `annotated/cell-lines.somatic-variants.annotated.txt` datasource file
329+
- `copynumbers` - path within `root` that points to the `annotated/cell-lines.copy-numbers.annotated.txt` file
330+
- `fusions` - path within `root` that points to the `annotated/cell-lines.fusions.annotated.txt` datasource file
331+
- `fusions1` - path within `root` that points to the `annotated/cell-lines.fusions.annotated.gene1.txt` datasource file
332+
- `fusions2` - path within `root` that points to the `annotated/cell-lines.fusions.annotated.gene2.txt` datasource file
333+
- `gdsc` - path within `root` that points to the `formatted/sanger.gdsc.txt` datasource file
334+
- `dictionarey` - path within `root` that points to the `cell-lines.pkl` datasource file
335+
336+
For more information about each datasource, view the [datasources/preclinical/ directory](../datasources/preclinical/README.md)
337+
338+
# Simplified input
278339
`--input` is an argument only used with `simplified_input.py`. It accepts a tab delimited file with one genomic alteration per row based on MOAlmanac's [standardized feature columns](../docs/description-of-outputs.md#standardized-feature-columns). In short the following columns are expected,
279340
1. `feature_type`, the data type of the molecular features and accepts `Somatic Variant`, `Germline Variant`, `Copy Number`, or `Rearrangement`. These strings can be customized in the `feature_types` section of [config.ini](config.ini).
280341
2. `gene` or `feature`, the gene name of the genomic alteration.

docs/description-of-outputs.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ All outputs will be produced by Molecular Oncology Almanac, though some may not
2020
* [Therapeutic resistance](#therapeutic-resistance)
2121
* [Disease prognosis](#disease-prognosis)
2222
* [Produced outputs](#produced-outputs)
23+
* [Log](#log)
2324
* [Actionable](#actionable)
2425
* [Germline](#germline)
2526
* [American College of Medical Genetics](#american-college-of-medical-genetics)
@@ -225,6 +226,11 @@ Based on the score of a moleculear feature in `almanac_bin`, Molecular Oncology
225226
# Produced outputs
226227
The following outputs are produced by the Molecular Oncology Almanac. Each section lists the filename suffix and then a details the contents of the output.
227228

229+
## Log
230+
Filename suffix: `.log`
231+
232+
A timestamped log of inputs provided, configuration variables set, and what happens step-by-step as moalmanac.py is running.
233+
228234
## Actionable
229235
Filename suffix: `.actionable.txt`
230236

0 commit comments

Comments
 (0)