Skip to content

Commit ae96ef7

Browse files
committed
Updating ReadMe
1 parent f2b1cf5 commit ae96ef7

File tree

4 files changed

+46
-31
lines changed

4 files changed

+46
-31
lines changed

README.md

Lines changed: 39 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -55,32 +55,46 @@ See below for a detailed list of available parameters
5555

5656
4. The partitioned vcf files are placed under [project_path]/ouput/clustered/ and [project_path]/ouput/nonClustered/. You can visualize the results by looking at the IMD plots available under [project_path]/ouput/plots/.
5757

58+
5859
**AVAILABLE PARAMETERS**
5960

60-
Required:
61-
project: [string] Unique name for the given project.
62-
genome: [string] Reference genome to use. Must be installed using SigProfilerMatrixGenerator.
63-
contexts: [string] Contexts needs to be one of the following {“96”, “ID”}.
64-
simContext: [list of strings] Mutations context that was used for generating the background model (e.g ["6144"] or ["96"]).
65-
input_path: [string] Path to the given project. Please add a backslash(/) at the end of the input path. For example: "path/to/the/input_file/".
66-
67-
Optional:
68-
analysis: [string] Desired analysis pipeline. By default output_type='all'. Other options include "subClassify" and "hotspot".
69-
sortSims: [boolean] Option to sort the simulated files if they have already been sorted. By default sortSims=True to ensure accurate results. The files must be sorted for accurate results.
70-
interdistance: [string] The mutation types to calculate IMDs between - Use only when performing analysis of indels (default='ID').
71-
calculateIMD: [boolean] Parameter to calculate the IMDs. This will save time if you need to rerun the subclassification step only (default=True).
72-
max_cpu: [integer] Change the number of allocated CPUs. By default all CPUs are used.
73-
subClassify: [boolean] Subclassify the clustered mutations. Requires that VAF scores are available in TCGA or Sanger format. By default subClassify=False. See VAF Format below for more details.
74-
plotIMDfigure: [boolean] Parameter that generates IMD and mutational spectra plots for each sample (default=True).
75-
plotRainfall [boolean] Parameter that generates rainfall plots for each sample using the subclassification of clustered events (default=True).
76-
77-
The following parameters are used if the subClassify argument is True:
78-
includedVAFs: [boolean] Parameter that informs the tool of the inclusion of VAFs in the dataset (default=True).
79-
includedCCFs: [boolean] Parameter that informs the tool of the inclusion of CCFs in the dataset (default=True). If CCFs are used, set includedVAFs=False.
80-
variant_caller: [string] Parameter that informs the tool of what format the VAF scores are provided (default='standard').
81-
windowSize: [integer] Window size for calculating mutation density in the rainfall plots. By default windowSize=10000000.
82-
correction [boolean] Optional parameter to perform a genome-wide mutational density correction (boolean; default=False).
83-
probability [boolean] Optional parameter to calculate the probability of observing each clustered event within the localized region of the genome. These values are saved into the [project_path]/output/clustered/ directories. See OSF wiki page for more details.
61+
### Required Parameters
62+
63+
| Parameter | Variable Type | Description |
64+
|--------------|--------------------|-------------|
65+
| `project` | String | Unique name for the given project. |
66+
| `genome` | String | Reference genome to use. Must be installed using SigProfilerMatrixGenerator. |
67+
| `contexts` | String | Contexts needs to be one of the following: `"96"`, `"ID"`. |
68+
| `simContext` | List of Strings | Mutation context used for generating the background model (e.g., `["6144"]` or `["96"]`). |
69+
| `input_path` | String | Path to the input files. Must end with a `/`, e.g., `"path/to/the/input_file/"`. |
70+
71+
---
72+
73+
### Optional Parameters
74+
75+
| Parameter | Variable Type | Description |
76+
|----------------|----------------|-------------|
77+
| `analysis` | String | Desired analysis pipeline. Options include `"all"` (default), `"subClassify"`, and `"hotspot"`. |
78+
| `sortSims` | Boolean | Option to sort simulated files. Ensures accuracy. Default: `True`. |
79+
| `interdistance` | String | Mutation types to calculate IMDs between. Use only for indel analysis. Default: `"ID"`. |
80+
| `calculateIMD` | Boolean | Whether to calculate IMDs. Useful for rerunning subclassification only. Default: `True`. |
81+
| `max_cpu` | Integer | Number of CPUs to use. Default: all available CPUs. |
82+
| `subClassify` | Boolean | Subclassify clustered mutations (requires VAF scores in TCGA/Sanger format). Default: `False`. |
83+
| `plotIMDfigure` | Boolean | Generate IMD and mutational spectra plots for each sample. Default: `True`. |
84+
| `plotRainfall` | Boolean | Generate rainfall plots using subclassified clustered events. Default: `True`. |
85+
86+
---
87+
88+
### Parameters Used if `subClassify=True`
89+
90+
| Parameter | Variable Type | Description |
91+
|------------------|---------------|-------------|
92+
| `includedVAFs` | Boolean | Indicates VAFs are included in the dataset. Default: `True`. |
93+
| `includedCCFs` | Boolean | Indicates CCFs are included. If `True`, set `includedVAFs=False`. Default: `True`. |
94+
| `variant_caller` | String | Format of VAF scores (e.g., `"standard"`). Default: `"standard"`. |
95+
| `windowSize` | Integer | Window size for calculating mutation density in rainfall plots. Default: `10000000`. |
96+
| `correction` | Boolean | Perform genome-wide mutational density correction. Default: `False`. |
97+
| `probability` | Boolean | Calculate the probability of observing each clustered event in its local region. Output saved in `[project_path]/output/clustered/`. Default: `False`. |
8498

8599

86100
**VAF Format**
@@ -93,7 +107,7 @@ If your VAF is recorded in the 11th column of your VCF as the last number of the
93107

94108
If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
95109

96-
If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
110+
If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
97111

98112
If your VCFs have no recorded VAFs set includedVAFs=False. This will run SigProfilerClusters, subclassify clusters based on just the calculated IMD (provided that you set subclassify=True).
99113

SigProfilerClusters/SigProfilerClusters.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -671,10 +671,11 @@ def analysis(
671671
max_cpu -> optional parameter to specify the number of maximum cpu's to use for parallelizing the code (integer; default=None: uses all available cpu's)
672672
subClassify -> optional parameter to subclassify the clustered mutations into refinded classes including DBSs, extended MBSs, kataegis, etc. (boolean; default=False)
673673
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
674-
-> sanger: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
675-
-> TCGA: If your VAF is recorded in the 8th column of your VCF as VAF=xx, set variant_caller="TCGA".
676-
-> standardVC: If your VAF is recorded in the 10th column of your VCF as AF=xx, set variant_caller="standardVC".
677-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
674+
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
675+
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
676+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
677+
678+
678679
includedVAFs -> optional parameter that informs the tool of the inclusion of VAFs in the dataset (boolean; default=True)
679680
includedCCFs -> optional parameter that informs the tool of the inclusion of cancer cell fractions in the dataset (boolean; default=True)
680681
windowSize -> the size of the window used for correcting the IMDs based upon mutational density within a given genomic range (integer; default=10000000)

SigProfilerClusters/SigProfilerHotSpots.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -510,7 +510,7 @@ def analysis(
510510
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
511511
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
512512
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
513-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
513+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
514514
includedVAFs -> optional parameter that informs the tool of the inclusion of VAFs in the dataset (boolean; default=True)
515515
windowSize -> the size of the window used for correcting the IMDs based upon mutational density within a given genomic range (integer; default=10000000)
516516
plotIMDfigure -> optional parameter that generates IMD and mutational spectra plots for each sample (boolean; default=True).

SigProfilerClusters/classifyFunctions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -310,7 +310,7 @@ def pullVaf(project, project_path, variant_caller="standard", correction=True):
310310
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
311311
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
312312
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
313-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
313+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
314314
correction -> optional parameter to perform a genome-wide mutational density correction (boolean; default=False)
315315
316316
Returns:

0 commit comments

Comments
 (0)