|
1 |
| -# GMSC_manuscript_analysis |
| 1 | +# A catalogue of small proteins from the global microbiome |
| 2 | +This repository contains files and scripts to generate analysis and figures in the manuscript "A catalogue of small proteins from the global microbiome": |
| 3 | +> Yiqian Duan, Celio Dias Santos-Junior, Thomas Sebastian Schmidt, Anthony Fullam, Breno L. S. de Almeida, Chengkai Zhu, Kuhn Michael, Xing-Ming Zhao, Peer Bork, Luis Pedro Coelho |
| 4 | +bioRxiv 2023.12.27.573469; doi: https://doi.org/10.1101/2023.12.27.573469 |
| 5 | + |
| 6 | +The global microbial smORFs catalogue (GMSC) is available at https://gmsc.big-data-biology.org |
| 7 | + |
| 8 | +## Introduction |
| 9 | + |
| 10 | +The folder **General_Scripts** contains scripts to generate GMSC resourece from the raw data. |
| 11 | + |
| 12 | +The folder **Manuscript_Analysis** contains pre-computed files and scripts to run the analysis and generate figures included in the GMSC manuscript. |
| 13 | + |
| 14 | +## Dependencies |
| 15 | + |
| 16 | +The softwares are required for the scripts. |
| 17 | + |
| 18 | +| **Software** | **Availability** | |
| 19 | +| :---: | :---: | |
| 20 | +| NGLess (v.1.3.0) | https://github.com/ngless-toolkit/ngless | |
| 21 | +| Prodigal (v 2.6.3) | https://github.com/hyattpd/Prodigal | |
| 22 | +| Macrel (v.0.5) | https://github.com/BigDataBiology/macrel | |
| 23 | +| MMseqs2 | https://github.com/soedinglab/MMseqs2 | |
| 24 | +| Swipe (v.2.1.1) | https://github.com/torognes/swipe | |
| 25 | +| DIAMOND (v.2.0.4) | https://github.com/bbuchfink/diamond | |
| 26 | +| HMMer (v.3.3.2) | http://hmmer.org/ | |
| 27 | +| MAFFT (v.7.475) | https://mafft.cbrc.jp/alignment/software/ | |
| 28 | +| RNAcode (v.0.3) | https://github.com/ViennaRNA/RNAcode | |
| 29 | +| BWA (v.0.7.17) | https://github.com/lh3/bwa | |
| 30 | +| BLAST (v.2.13.0) | https://blast.ncbi.nlm.nih.gov/Blast.cgi | |
| 31 | +| TMHMM (v.2.0) | https://services.healthtech.dtu.dk/services/TMHMM-2.0/ | |
| 32 | +| SignalP (v.5.0) | https://services.healthtech.dtu.dk/services/SignalP-5.0/ | |
| 33 | + |
| 34 | +## Data Availability |
| 35 | + |
| 36 | +### Database |
| 37 | + |
| 38 | +Theese databases are used in the construction and analysis of the catalogue. |
| 39 | + |
| 40 | +| **Database** | **Availability** | |
| 41 | +| :---: | :---: | |
| 42 | +| SPIRE | http://spire.embl.de | |
| 43 | +| ProGenomes v2 | http://progenomes2.embl.de/ | |
| 44 | +| AntiFam (v.7.0) | ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ | |
| 45 | +| PRIDE | https://www.ebi.ac.uk/pride/ | |
| 46 | +| RefSeq | https://ftp.ncbi.nlm.nih.gov/refseq/release/ | |
| 47 | +| The Conserved Domain Database | https://ftp.ncbi.nih.gov/pub/mmdb/cdd | |
| 48 | +| GTDB R95 | https://gtdb.ecogenomic.org/ | |
| 49 | + |
| 50 | +### Preprocessed data |
| 51 | + |
| 52 | +smORF catalogue & annotations: The smORF catalogue and its annotations are available at https://doi.org/10.5281/zenodo.7944370. |
| 53 | + |
| 54 | +Preprocessed data: For convenience, the preprocessed files are available under the Preprocessed_Files folder. |
0 commit comments