You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/description-of-inputs.md
+63-2Lines changed: 63 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,8 +3,11 @@
3
3
Example inputs can be found in the [`example_data/`](/example_data/) folder, found in the root directory of this repository.
4
4
5
5
# Table of Contents
6
+
The following describes required and optional arguments to run `moalmanac/moalmanac.py`.
6
7
*[Required arguments](#required-arguments)
7
8
-[Patient id](#patient-id)
9
+
-[Config](#config)
10
+
-[Databases](#databases)
8
11
*[Optional arguments](#optional-arguments)
9
12
-[Tumor type](#tumor-type)
10
13
-[Stage](#stage)
@@ -23,14 +26,53 @@ Example inputs can be found in the [`example_data/`](/example_data/) folder, fou
23
26
-[Disable matchmaking](#disable-matchmaking)
24
27
-[Description](#description)
25
28
-[Output directory](#output-directory)
26
-
-[Simplified input](#simplified-input)
29
+
-[Preclinical databases](#preclinical-databases)
30
+
31
+
Alternatively, a simplified version of the interpretation algorithm can be run by using `moalmanac/simplified_input.py`.
32
+
-[Simplified input arguments](#simplified-input)
27
33
28
34
# Required arguments
29
35
The following arguments are required to run Molecular Oncology Almanac.
30
36
31
37
## Patient id
32
38
`--patient_id` expects a single string value which is used for labeling outputs.
33
39
40
+
## Config
41
+
`--config` expects a file path to the [config.ini](https://github.com/vanallenlab/moalmanac/blob/main/moalmanac/config.ini) file.
42
+
43
+
This config file contains the following sections,
44
+
-`function_toggle` - allows several features of the MOAlmanac algorithm to be enabled or disabled
45
+
-`logging` - specifies the [level](https://docs.python.org/3/library/logging.html#levels) that the logger should be configured to use
46
+
-`versions` - specifies the versions of the [MOAlmanac algorithm (interpreter)](https://github.com/vanallenlab/moalmanac/releases) and [database](https://github.com/vanallenlab/moalmanac-db/releases).
47
+
-`exac` - specifies the allele frequency threshold used with [ExAC](https://github.com/vanallenlab/moalmanac/tree/main/datasources/exac) to specify if a variant is a common variant or not
48
+
-`fusion` - specifies minimum spanning fragments required for review by MOAlmanac, column names expected from inputs, and how "Fusion" should be written from input
49
+
-`mutations` - specifies the minimum coverage and allelic fraction that a variant needs for review by MOAlmanac
50
+
-`seg` - specifies the percentile to evaluate copy gain and loss variants from segmented copy number input files, as well as how amplification and deletion should be written as strings
51
+
-`signatures` - specifies the minimum contribution required to review COSMIC mutational signatures by mMOAlmanac
52
+
-`validation_sequencing` - Thresholds for minimum power to detect variants and minimum allelic fraction for annotation from validation sequencing. This is further described in the [Methods section](https://www.nature.com/articles/s43018-021-00243-3#Sec8) of our paper.
53
+
-`feature_types` - String labels for each biomarker type passed to the algorithm. These values will be included in `feature_type` column of outputs.
54
+
55
+
## Databases
56
+
`--dbs` expects a file path to the [annotation-databases.ini](../moalmanac/annotation-databases.ini) file.
57
+
58
+
This config file contains a single section `databases` that lists the following:
59
+
-`root` - path to `datasources/` directory
60
+
-`almanac_handle` - path within `root` that points to the `molecular-oncology-almanac.json` datasource file
61
+
-`cancerhotspots_handle` - path within `root` that points to the Cancer Hotspots datasource file
62
+
-`3dcancerhotspots_handle` - path within `root` that points to the Cancer Hotspots 3D datasource file
63
+
-`cgc_handle` - path within `root` that points to the Cancer Gene Census file
64
+
-`cosmic_handle` - path within `root` that points to the COSMIC datasource file
65
+
-`gsea_pathways_handle` - path within `root` that points to the GSEA pathways datasource file
66
+
-`gsea_modules_handle` - path within `root` that points to the GSEA modules datasource file
67
+
-`exac_handle` - path within `root` that points to the ExAC datasource file
68
+
-`acmg_handle` - path within `root` that points to the ACMG datasource file
69
+
-`clinvar_handle` - path within `root` that points to the ClinVar datasource file
70
+
-`hereditary_handle` - path within `root` that points to the genes related to hereditary cancers datasource file
71
+
-`oncotree_handle` - path within `root` that points to the Oncotree datasource file
72
+
-`lawrence_handle` - path within `root` that points to the Lawrence et al. TCGA mutational burden datasource file
73
+
74
+
For more information about each datasource, view the [datasources directory](../datasources/README.md)
75
+
34
76
# Optional arguments
35
77
Molecular Oncology Almanac will run successfully given any combination of the following arguments:
36
78
@@ -274,7 +316,26 @@ The required fields for this file can be changed from their default expectations
274
316
## Output directory
275
317
`--output-directory` allows users to specify an output directory to write outputs to, the current working directory will be used if unspecified.
276
318
277
-
## Simplified input
319
+
## Preclinical databases
320
+
`--preclinical-dbs` expects a file path to the [preclinical-databases.ini](../moalmanac/preclinical-databases.ini) file. This argument and ini file are required to run either module that either:
321
+
- Looks at the efficacy of relationships in cancer cell lines
322
+
- Performs genomic similarity to cancer cell lines
323
+
324
+
This config file contains a single section `preclinical` that lists the following:
325
+
-`root` - path to `datasources/preclinical/` directory
326
+
-`almanac_gdsc_mappings` - path within `root` that points to the `formatted/almanac-gdsc-mappings.json` datasource file
327
+
-`summary` - path within `root` that points to the `formatted/cell-lines.summary.txt` datasource file
328
+
-`variants` - path within `root` that points to the `annotated/cell-lines.somatic-variants.annotated.txt` datasource file
329
+
-`copynumbers` - path within `root` that points to the `annotated/cell-lines.copy-numbers.annotated.txt` file
330
+
-`fusions` - path within `root` that points to the `annotated/cell-lines.fusions.annotated.txt` datasource file
331
+
-`fusions1` - path within `root` that points to the `annotated/cell-lines.fusions.annotated.gene1.txt` datasource file
332
+
-`fusions2` - path within `root` that points to the `annotated/cell-lines.fusions.annotated.gene2.txt` datasource file
333
+
-`gdsc` - path within `root` that points to the `formatted/sanger.gdsc.txt` datasource file
334
+
-`dictionarey` - path within `root` that points to the `cell-lines.pkl` datasource file
335
+
336
+
For more information about each datasource, view the [datasources/preclinical/ directory](../datasources/preclinical/README.md)
337
+
338
+
# Simplified input
278
339
`--input` is an argument only used with `simplified_input.py`. It accepts a tab delimited file with one genomic alteration per row based on MOAlmanac's [standardized feature columns](../docs/description-of-outputs.md#standardized-feature-columns). In short the following columns are expected,
279
340
1.`feature_type`, the data type of the molecular features and accepts `Somatic Variant`, `Germline Variant`, `Copy Number`, or `Rearrangement`. These strings can be customized in the `feature_types` section of [config.ini](config.ini).
280
341
2.`gene` or `feature`, the gene name of the genomic alteration.
*[American College of Medical Genetics](#american-college-of-medical-genetics)
@@ -225,6 +226,11 @@ Based on the score of a moleculear feature in `almanac_bin`, Molecular Oncology
225
226
# Produced outputs
226
227
The following outputs are produced by the Molecular Oncology Almanac. Each section lists the filename suffix and then a details the contents of the output.
227
228
229
+
## Log
230
+
Filename suffix: `.log`
231
+
232
+
A timestamped log of inputs provided, configuration variables set, and what happens step-by-step as moalmanac.py is running.
0 commit comments