You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .travis.yml
+12-1Lines changed: 12 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -4,8 +4,19 @@ language: python
4
4
python:
5
5
- '3.9'
6
6
- '3.12'
7
+
before_install:
8
+
- if ! [ -f ./src/GRCh37.tar.gz ]; then wget --connect-timeout=10 --tries=20 ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/GRCh37.tar.gz -P ./src/; fi
Copy file name to clipboardExpand all lines: README.md
+41-29Lines changed: 41 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,32 +55,46 @@ See below for a detailed list of available parameters
55
55
56
56
4. The partitioned vcf files are placed under [project_path]/ouput/clustered/ and [project_path]/ouput/nonClustered/. You can visualize the results by looking at the IMD plots available under [project_path]/ouput/plots/.
57
57
58
+
58
59
**AVAILABLE PARAMETERS**
59
60
60
-
Required:
61
-
project: [string] Unique name for the given project.
62
-
genome: [string] Reference genome to use. Must be installed using SigProfilerMatrixGenerator.
63
-
contexts: [string] Contexts needs to be one of the following {“96”, “ID”}.
64
-
simContext: [list of strings] Mutations context that was used for generating the background model (e.g ["6144"] or ["96"]).
65
-
input_path: [string] Path to the given project. Please add a backslash(/) at the end of the input path. For example: "path/to/the/input_file/".
66
-
67
-
Optional:
68
-
analysis: [string] Desired analysis pipeline. By default output_type='all'. Other options include "subClassify" and "hotspot".
69
-
sortSims: [boolean] Option to sort the simulated files if they have already been sorted. By default sortSims=True to ensure accurate results. The files must be sorted for accurate results.
70
-
interdistance: [string] The mutation types to calculate IMDs between - Use only when performing analysis of indels (default='ID').
71
-
calculateIMD: [boolean] Parameter to calculate the IMDs. This will save time if you need to rerun the subclassification step only (default=True).
72
-
max_cpu: [integer] Change the number of allocated CPUs. By default all CPUs are used.
73
-
subClassify: [boolean] Subclassify the clustered mutations. Requires that VAF scores are available in TCGA or Sanger format. By default subClassify=False. See VAF Format below for more details.
74
-
plotIMDfigure: [boolean] Parameter that generates IMD and mutational spectra plots for each sample (default=True).
75
-
plotRainfall [boolean] Parameter that generates rainfall plots for each sample using the subclassification of clustered events (default=True).
76
-
77
-
The following parameters are used if the subClassify argument is True:
78
-
includedVAFs: [boolean] Parameter that informs the tool of the inclusion of VAFs in the dataset (default=True).
79
-
includedCCFs: [boolean] Parameter that informs the tool of the inclusion of CCFs in the dataset (default=True). If CCFs are used, set includedVAFs=False.
80
-
variant_caller: [string] Parameter that informs the tool of what format the VAF scores are provided (default=None).Currently, there are four supported formats: sanger, TCGA, standardVC and mutect2.
81
-
windowSize: [integer] Window size for calculating mutation density in the rainfall plots. By default windowSize=10000000.
82
-
correction [boolean] Optional parameter to perform a genome-wide mutational density correction (boolean; default=False).
83
-
probability [boolean] Optional parameter to calculate the probability of observing each clustered event within the localized region of the genome. These values are saved into the [project_path]/output/clustered/ directories. See OSF wiki page for more details.
|`probability`| Boolean | Calculate the probability of observing each clustered event in its local region. Output saved in `[project_path]/output/clustered/`. Default: `False`. |
84
98
85
99
86
100
**VAF Format**
@@ -89,13 +103,11 @@ SigProfilerClusters uses the VAF recorded in the input files to subclassify clus
89
103
90
104
If you are not using VCFs as input files, VAFs cannot be used in the subclassification step. Therefore, to subclassify clusters using other input file types set subclassify=True and includedVAFs=False.
91
105
92
-
If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
93
-
94
-
If your VAF is recorded in the 8th column of your VCF as VCF=xx, set variant_caller="TCGA".
106
+
If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
95
107
96
-
If your VAF is recorded in the 8th column of your VCF as AF=xx, set variant_caller="standardVC".
108
+
If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
97
109
98
-
If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
110
+
If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
99
111
100
112
If your VCFs have no recorded VAFs set includedVAFs=False. This will run SigProfilerClusters, subclassify clusters based on just the calculated IMD (provided that you set subclassify=True).
Copy file name to clipboardExpand all lines: SigProfilerClusters/SigProfilerClusters.py
+6-5Lines changed: 6 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -638,7 +638,7 @@ def analysis(
638
638
chrom_based=False,
639
639
max_cpu=None,
640
640
subClassify=False,
641
-
variant_caller=None,
641
+
variant_caller="standard",
642
642
includedVAFs=True,
643
643
includedCCFs=False,
644
644
windowSize=1000000,
@@ -671,10 +671,11 @@ def analysis(
671
671
max_cpu -> optional parameter to specify the number of maximum cpu's to use for parallelizing the code (integer; default=None: uses all available cpu's)
672
672
subClassify -> optional parameter to subclassify the clustered mutations into refinded classes including DBSs, extended MBSs, kataegis, etc. (boolean; default=False)
673
673
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
674
-
-> sanger: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
675
-
-> TCGA: If your VAF is recorded in the 8th column of your VCF as VCF=xx, set variant_caller="TCGA".
676
-
-> standardVC: If your VAF is recorded in the 10th column of your VCF as AF=xx, set variant_caller="standardVC".
677
-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
674
+
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
675
+
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
676
+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
677
+
678
+
678
679
includedVAFs -> optional parameter that informs the tool of the inclusion of VAFs in the dataset (boolean; default=True)
679
680
includedCCFs -> optional parameter that informs the tool of the inclusion of cancer cell fractions in the dataset (boolean; default=True)
680
681
windowSize -> the size of the window used for correcting the IMDs based upon mutational density within a given genomic range (integer; default=10000000)
Copy file name to clipboardExpand all lines: SigProfilerClusters/SigProfilerHotSpots.py
+4-5Lines changed: 4 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -477,7 +477,7 @@ def analysis(
477
477
chrom_based=False,
478
478
max_cpu=None,
479
479
subClassify=False,
480
-
variant_caller=None,
480
+
variant_caller="standard",
481
481
includedVAFs=True,
482
482
windowSize=1000000,
483
483
bedRanges=None,
@@ -508,10 +508,9 @@ def analysis(
508
508
max_cpu -> optional parameter to specify the number of maximum cpu's to use for parallelizing the code (integer; default=None: uses all available cpu's)
509
509
subClassify -> optional parameter to subclassify the clustered mutations into refinded classes including DBSs, extended MBSs, kataegis, etc. (boolean; default=False)
510
510
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
511
-
-> sanger: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
512
-
-> TCGA: If your VAF is recorded in the 8th column of your VCF as VCF=xx, set variant_caller="TCGA".
513
-
-> standardVC: If your VAF is recorded in the 10th column of your VCF as AF=xx, set variant_caller="standardVC".
514
-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
511
+
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
512
+
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
513
+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
515
514
includedVAFs -> optional parameter that informs the tool of the inclusion of VAFs in the dataset (boolean; default=True)
516
515
windowSize -> the size of the window used for correcting the IMDs based upon mutational density within a given genomic range (integer; default=10000000)
517
516
plotIMDfigure -> optional parameter that generates IMD and mutational spectra plots for each sample (boolean; default=True).
0 commit comments