Skip to content

Commit 9521f79

Browse files
authored
Merge pull request #31 from AlexandrovLab/u69
U69
2 parents 8c4d28a + 878aaca commit 9521f79

File tree

16 files changed

+484
-221
lines changed

16 files changed

+484
-221
lines changed

.travis.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,19 @@ language: python
44
python:
55
- '3.9'
66
- '3.12'
7+
before_install:
8+
- if ! [ -f ./src/GRCh37.tar.gz ]; then wget --connect-timeout=10 --tries=20 ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/GRCh37.tar.gz -P ./src/; fi
79

810
install:
911
- pip install .
1012

11-
script: python3 test.py
13+
cache:
14+
directories:
15+
- $TRAVIS_BUILD_DIR/src/
16+
17+
before_script:
18+
- SigProfilerMatrixGenerator install GRCh37 --local_genome $TRAVIS_BUILD_DIR/src/
19+
20+
script:
21+
- python3 test.py
22+
- pytest -s -rw tests

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,21 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66

77
## [Unreleased]
88

9+
## [1.2.2] - 2025-07-21
10+
11+
### Added
12+
- Integrated `pytest` to ensure correct handling of the `variant_caller` parameter.
13+
14+
### Changed
15+
- Update the standard names for the `variant_caller` parameter
16+
- Changed `sanger` to `caveman`
17+
- `TCGA` and `standardVC` have been merged into a more flexible `standard` option.
18+
- `standard` is now default and parses VAF from 8th and 10th columns of VCF files (`VAF=` or `AF=`).
19+
20+
### Fixed
21+
- Plotting Stability: Added error handling to skip samples that are invalid and proceed with valid ones.
22+
- Resolved a potential index out-of-bounds error in the `plot_hist` function.
23+
924
## [1.2.1] - 2025-04-02
1025

1126
### Changed

README.md

Lines changed: 41 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -55,32 +55,46 @@ See below for a detailed list of available parameters
5555

5656
4. The partitioned vcf files are placed under [project_path]/ouput/clustered/ and [project_path]/ouput/nonClustered/. You can visualize the results by looking at the IMD plots available under [project_path]/ouput/plots/.
5757

58+
5859
**AVAILABLE PARAMETERS**
5960

60-
Required:
61-
project: [string] Unique name for the given project.
62-
genome: [string] Reference genome to use. Must be installed using SigProfilerMatrixGenerator.
63-
contexts: [string] Contexts needs to be one of the following {“96”, “ID”}.
64-
simContext: [list of strings] Mutations context that was used for generating the background model (e.g ["6144"] or ["96"]).
65-
input_path: [string] Path to the given project. Please add a backslash(/) at the end of the input path. For example: "path/to/the/input_file/".
66-
67-
Optional:
68-
analysis: [string] Desired analysis pipeline. By default output_type='all'. Other options include "subClassify" and "hotspot".
69-
sortSims: [boolean] Option to sort the simulated files if they have already been sorted. By default sortSims=True to ensure accurate results. The files must be sorted for accurate results.
70-
interdistance: [string] The mutation types to calculate IMDs between - Use only when performing analysis of indels (default='ID').
71-
calculateIMD: [boolean] Parameter to calculate the IMDs. This will save time if you need to rerun the subclassification step only (default=True).
72-
max_cpu: [integer] Change the number of allocated CPUs. By default all CPUs are used.
73-
subClassify: [boolean] Subclassify the clustered mutations. Requires that VAF scores are available in TCGA or Sanger format. By default subClassify=False. See VAF Format below for more details.
74-
plotIMDfigure: [boolean] Parameter that generates IMD and mutational spectra plots for each sample (default=True).
75-
plotRainfall [boolean] Parameter that generates rainfall plots for each sample using the subclassification of clustered events (default=True).
76-
77-
The following parameters are used if the subClassify argument is True:
78-
includedVAFs: [boolean] Parameter that informs the tool of the inclusion of VAFs in the dataset (default=True).
79-
includedCCFs: [boolean] Parameter that informs the tool of the inclusion of CCFs in the dataset (default=True). If CCFs are used, set includedVAFs=False.
80-
variant_caller: [string] Parameter that informs the tool of what format the VAF scores are provided (default=None).Currently, there are four supported formats: sanger, TCGA, standardVC and mutect2.
81-
windowSize: [integer] Window size for calculating mutation density in the rainfall plots. By default windowSize=10000000.
82-
correction [boolean] Optional parameter to perform a genome-wide mutational density correction (boolean; default=False).
83-
probability [boolean] Optional parameter to calculate the probability of observing each clustered event within the localized region of the genome. These values are saved into the [project_path]/output/clustered/ directories. See OSF wiki page for more details.
61+
### Required Parameters
62+
63+
| Parameter | Variable Type | Description |
64+
|--------------|--------------------|-------------|
65+
| `project` | String | Unique name for the given project. |
66+
| `genome` | String | Reference genome to use. Must be installed using SigProfilerMatrixGenerator. |
67+
| `contexts` | String | Contexts needs to be one of the following: `"96"`, `"ID"`. |
68+
| `simContext` | List of Strings | Mutation context used for generating the background model (e.g., `["6144"]` or `["96"]`). |
69+
| `input_path` | String | Path to the input files. Must end with a `/`, e.g., `"path/to/the/input_file/"`. |
70+
71+
---
72+
73+
### Optional Parameters
74+
75+
| Parameter | Variable Type | Description |
76+
|----------------|----------------|-------------|
77+
| `analysis` | String | Desired analysis pipeline. Options include `"all"` (default), `"subClassify"`, and `"hotspot"`. |
78+
| `sortSims` | Boolean | Option to sort simulated files. Ensures accuracy. Default: `True`. |
79+
| `interdistance` | String | Mutation types to calculate IMDs between. Use only for indel analysis. Default: `"ID"`. |
80+
| `calculateIMD` | Boolean | Whether to calculate IMDs. Useful for rerunning subclassification only. Default: `True`. |
81+
| `max_cpu` | Integer | Number of CPUs to use. Default: all available CPUs. |
82+
| `subClassify` | Boolean | Subclassify clustered mutations (requires VAF scores in TCGA/Sanger format). Default: `False`. |
83+
| `plotIMDfigure` | Boolean | Generate IMD and mutational spectra plots for each sample. Default: `True`. |
84+
| `plotRainfall` | Boolean | Generate rainfall plots using subclassified clustered events. Default: `True`. |
85+
86+
---
87+
88+
### Parameters Used if `subClassify=True`
89+
90+
| Parameter | Variable Type | Description |
91+
|------------------|---------------|-------------|
92+
| `includedVAFs` | Boolean | Indicates VAFs are included in the dataset. Default: `True`. |
93+
| `includedCCFs` | Boolean | Indicates CCFs are included. If `True`, set `includedVAFs=False`. Default: `True`. |
94+
| `variant_caller` | String | Format of VAF scores (e.g., `"standard"`). Default: `"standard"`. |
95+
| `windowSize` | Integer | Window size for calculating mutation density in rainfall plots. Default: `10000000`. |
96+
| `correction` | Boolean | Perform genome-wide mutational density correction. Default: `False`. |
97+
| `probability` | Boolean | Calculate the probability of observing each clustered event in its local region. Output saved in `[project_path]/output/clustered/`. Default: `False`. |
8498

8599

86100
**VAF Format**
@@ -89,13 +103,11 @@ SigProfilerClusters uses the VAF recorded in the input files to subclassify clus
89103

90104
If you are not using VCFs as input files, VAFs cannot be used in the subclassification step. Therefore, to subclassify clusters using other input file types set subclassify=True and includedVAFs=False.
91105

92-
If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
93-
94-
If your VAF is recorded in the 8th column of your VCF as VCF=xx, set variant_caller="TCGA".
106+
If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
95107

96-
If your VAF is recorded in the 8th column of your VCF as AF=xx, set variant_caller="standardVC".
108+
If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
97109

98-
If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
110+
If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
99111

100112
If your VCFs have no recorded VAFs set includedVAFs=False. This will run SigProfilerClusters, subclassify clusters based on just the calculated IMD (provided that you set subclassify=True).
101113

SigProfilerClusters/SigProfilerClusters.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -638,7 +638,7 @@ def analysis(
638638
chrom_based=False,
639639
max_cpu=None,
640640
subClassify=False,
641-
variant_caller=None,
641+
variant_caller="standard",
642642
includedVAFs=True,
643643
includedCCFs=False,
644644
windowSize=1000000,
@@ -671,10 +671,11 @@ def analysis(
671671
max_cpu -> optional parameter to specify the number of maximum cpu's to use for parallelizing the code (integer; default=None: uses all available cpu's)
672672
subClassify -> optional parameter to subclassify the clustered mutations into refinded classes including DBSs, extended MBSs, kataegis, etc. (boolean; default=False)
673673
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
674-
-> sanger: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
675-
-> TCGA: If your VAF is recorded in the 8th column of your VCF as VCF=xx, set variant_caller="TCGA".
676-
-> standardVC: If your VAF is recorded in the 10th column of your VCF as AF=xx, set variant_caller="standardVC".
677-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
674+
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
675+
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
676+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
677+
678+
678679
includedVAFs -> optional parameter that informs the tool of the inclusion of VAFs in the dataset (boolean; default=True)
679680
includedCCFs -> optional parameter that informs the tool of the inclusion of cancer cell fractions in the dataset (boolean; default=True)
680681
windowSize -> the size of the window used for correcting the IMDs based upon mutational density within a given genomic range (integer; default=10000000)

SigProfilerClusters/SigProfilerHotSpots.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -477,7 +477,7 @@ def analysis(
477477
chrom_based=False,
478478
max_cpu=None,
479479
subClassify=False,
480-
variant_caller=None,
480+
variant_caller="standard",
481481
includedVAFs=True,
482482
windowSize=1000000,
483483
bedRanges=None,
@@ -508,10 +508,9 @@ def analysis(
508508
max_cpu -> optional parameter to specify the number of maximum cpu's to use for parallelizing the code (integer; default=None: uses all available cpu's)
509509
subClassify -> optional parameter to subclassify the clustered mutations into refinded classes including DBSs, extended MBSs, kataegis, etc. (boolean; default=False)
510510
variant_caller -> optional parameter that informs the tool of what format the VAF scores are provided (boolean; default=None). This is required when subClassify=True. Currently, there are four supported formats:
511-
-> sanger: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="sanger".
512-
-> TCGA: If your VAF is recorded in the 8th column of your VCF as VCF=xx, set variant_caller="TCGA".
513-
-> standardVC: If your VAF is recorded in the 10th column of your VCF as AF=xx, set variant_caller="standardVC".
514-
-> mutect2: If your VAF is recorded in the 11th column of your VCF as AF=xx, set variant_caller="mutect2".
511+
-> caveman: If your VAF is recorded in the 11th column of your VCF as the last number of the colon delimited values, set variant_caller="caveman".
512+
-> standard: If your VAF is recorded in the 8th or 10th column of your VCF as VAF=xx or AF=xx, set variant_caller="standard".
513+
-> mutect2: If your VAF is recorded in the 10th or 11th column of your VCF as AF=xx, set variant_caller="mutect2".
515514
includedVAFs -> optional parameter that informs the tool of the inclusion of VAFs in the dataset (boolean; default=True)
516515
windowSize -> the size of the window used for correcting the IMDs based upon mutational density within a given genomic range (integer; default=10000000)
517516
plotIMDfigure -> optional parameter that generates IMD and mutational spectra plots for each sample (boolean; default=True).

0 commit comments

Comments
 (0)