Skip to content

Commit e409639

Browse files
Merge pull request #377 from bbglab/dev
New release: v1.0.0 Ter
2 parents 14640cd + 1a61fc9 commit e409639

File tree

158 files changed

+18878
-3283
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

158 files changed

+18878
-3283
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,5 @@ nf-*-reports.tsv
1111
*.sif
1212
ste_notes.txt
1313
assets/HDP_files*
14+
scratch/
15+
scratchhhh/

CITATIONS.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,42 @@
88

99
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
1010
11-
## Pipeline tools
11+
## Sources of data and tools
1212

13-
- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
13+
- Nanoseq masks
1414

15-
> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
15+
> Abascal, F., Harvey, L.M.R., Mitchell, E. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021). https://doi.org/10.1038/s41586-021-03477-4
16+
17+
- **CADD scores**
18+
19+
> Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: Using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res. 2024 Jan 5. doi: 10.1093/nar/gkad989. PubMed PMID: 38183205.
20+
21+
- COSMIC signatures
22+
23+
> https://cancer.sanger.ac.uk/signatures/sbs
24+
25+
- **dNdScv covariates**
26+
27+
> Martincorena I, et al. (2017) Universal Patterns of Selection in Cancer and Somatic Tissues. Cell. http://www.cell.com/cell/fulltext/S0092-8674(17)31136-4
28+
29+
- Pfam
30+
31+
> Jaina Mistry, Sara Chuguransky, Lowri Williams, Matloob Qureshi, Gustavo A Salazar, Erik L L Sonnhammer, Silvio C E Tosatto, Lisanna Paladin, Shriya Raj, Lorna J Richardson, Robert D Finn, Alex Bateman, Pfam: The protein families database in 2021, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D412–D419, https://doi.org/10.1093/nar/gkaa913
32+
33+
- **Oncodrive3D & Oncodrive3D datasets.**
34+
35+
> Stefano Pellegrini, Olivia Dove-Estrella, Ferran Muiños, Nuria Lopez-Bigas, Abel Gonzalez-Perez, Oncodrive3D: fast and accurate detection of structural clusters of somatic mutations under positive selection, Nucleic Acids Research, Volume 53, Issue 15, 28 August 2025, gkaf776, https://doi.org/10.1093/nar/gkaf776
1636
1737
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
1838

1939
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
2040
41+
- Python
42+
- SigProfilerAssignment, MatrixGenerator
43+
- HDP
44+
- OncodriveFML
45+
- OncodriveCLUSTL
46+
2147
## Software packaging/containerisation tools
2248

2349
- [Anaconda](https://anaconda.com)

LICENSE

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,17 @@
1-
Placeholder License - Temporary Notice
1+
Copyright (C) 2025 Institute for Research in Biomedicine (IRB Barcelona)
22

3-
This repository is temporarily published without a definitive open-source license.
3+
deepCSA is the property of the Institute for Research in Biomedicine
4+
(IRB Barcelona), which hold the copyright thereto.
45

5-
The final license (GPLv3, AGPLv3, or other) will be confirmed and updated as soon as possible, and no later than the manuscript's publication.
6+
This program is free software: you can redistribute it and/or modify
7+
it under the terms of the GNU General Public License as
8+
published by the Free Software Foundation, either version 3 of the
9+
License, or (at your option) any later version.
610

7-
Until the final license is in place, this code is provided for **review purposes only**, it can be run, but it must not be used, modified, or redistributed without explicit permission from the authors.
11+
This program is distributed in the hope that it will be useful,
12+
but WITHOUT ANY WARRANTY; without even the implied warranty of
13+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14+
GNU General Public License for more details.
15+
16+
You should have received a copy of the GNU General Public License
17+
along with this program. If not, see <http://www.gnu.org/licenses/>

README.md

Lines changed: 10 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,10 @@
66

77
![deepCSA workflow overview](docs/images/deepCSA.png)
88

9-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
10-
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
11-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
12-
13-
<!-- 1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
14-
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) -->
15-
169
## Usage
1710

11+
You can find a detailed documentation in the [docs section](docs/README.md), but here there is a minimal summary on how to prepare the inputs. Still for your first runs if you need to make the complete set up you have to check the deeper documentation.
12+
1813
First, prepare a samplesheet with your input data that looks as follows:
1914

2015
`samplesheet.csv`:
@@ -43,12 +38,6 @@ nextflow run main.nf --outdir <OUTDIR> -profile singularity,<DESIRED PROFILE> -p
4338

4439
The input can be provided by the `--input` option but it is more recommended to define this and all the other parameters in a parameter file (i.e. `pipeline_params.yml`), that can be provided to the pipeline for running the analysis with the specified configuration. This will also allow the definition of the remaining required parameters.
4540

46-
### Warning
47-
48-
Please provide pipeline parameters via the Nextflow `-params-file` option or CLI. Custom config files including those
49-
provided by the `-c` Nextflow option can be used to provide any configuration **except for parameters**_;
50-
see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
51-
5241
## Credits
5342

5443
bbglab/deepCSA was originally written by Ferriol Calvet.
@@ -62,20 +51,11 @@ We thank the following people for their extensive assistance in the development
6251
* @AxelRosendahlHuber
6352
* @andrianovam
6453
* @migrau
65-
66-
<!-- TODO
67-
## Contributions and Support
68-
69-
If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
70-
-->
54+
* @rochamorro1
55+
* @m-huertasp
7156

7257
## Citations
7358

74-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
75-
<!-- If you use bbglab/deepCSA for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
76-
77-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
78-
7959
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
8060

8161
This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).
@@ -89,3 +69,9 @@ This pipeline uses code and infrastructure developed and maintained by the [nf-c
8969
## Documentation
9070

9171
Find the documentation ([link to docs](https://github.com/bbglab/deepCSA/tree/main/docs)).
72+
73+
We are working to provide the biggest possible detail on the [usage](docs/usage.md) and explanation of the rationale and [tools](docs/tools.md), but this is still in progress.
74+
75+
## Publications
76+
77+
> [Sex and smoking bias in the selection of somatic mutations in human bladder](https://www.nature.com/articles/s41586-025-09521-x)

assets/useful_scripts/deepcsa_maf2samplevcfs.py

100644100755
File mode changed.
Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -674,9 +674,7 @@ def generate_plot(gene,
674674
gene_pos)
675675

676676
# Domain
677-
domain_gene = o3d_annot_df[(o3d_annot_df["Gene"] == gene) &
678-
(o3d_annot_df["Type"] == "DOMAIN") &
679-
(o3d_annot_df["Evidence"] == "Pfam")].reset_index(drop=True)
677+
domain_gene = o3d_annot_df[(o3d_annot_df["Gene"] == gene)].reset_index(drop=True)
680678

681679
# Transcripts and Uniprot ID
682680
canonical_tr, o3d_tr = get_transcript_ids(gene, maf_df_f, o3d_seq_df)

0 commit comments

Comments
 (0)