Skip to content

Commit c7dd206

Browse files
committed
ENH add general README
1 parent 72b071a commit c7dd206

File tree

1 file changed

+54
-1
lines changed

1 file changed

+54
-1
lines changed

README.md

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,54 @@
1-
# GMSC_manuscript_analysis
1+
# A catalogue of small proteins from the global microbiome
2+
This repository contains files and scripts to generate analysis and figures in the manuscript "A catalogue of small proteins from the global microbiome":
3+
> Yiqian Duan, Celio Dias Santos-Junior, Thomas Sebastian Schmidt, Anthony Fullam, Breno L. S. de Almeida, Chengkai Zhu, Kuhn Michael, Xing-Ming Zhao, Peer Bork, Luis Pedro Coelho
4+
bioRxiv 2023.12.27.573469; doi: https://doi.org/10.1101/2023.12.27.573469
5+
6+
The global microbial smORFs catalogue (GMSC) is available at https://gmsc.big-data-biology.org
7+
8+
## Introduction
9+
10+
The folder **General_Scripts** contains scripts to generate GMSC resourece from the raw data.
11+
12+
The folder **Manuscript_Analysis** contains pre-computed files and scripts to run the analysis and generate figures included in the GMSC manuscript.
13+
14+
## Dependencies
15+
16+
The softwares are required for the scripts.
17+
18+
| **Software** | **Availability** |
19+
| :---: | :---: |
20+
| NGLess (v.1.3.0) | https://github.com/ngless-toolkit/ngless |
21+
| Prodigal (v 2.6.3) | https://github.com/hyattpd/Prodigal |
22+
| Macrel (v.0.5) | https://github.com/BigDataBiology/macrel |
23+
| MMseqs2 | https://github.com/soedinglab/MMseqs2 |
24+
| Swipe (v.2.1.1) | https://github.com/torognes/swipe |
25+
| DIAMOND (v.2.0.4) | https://github.com/bbuchfink/diamond |
26+
| HMMer (v.3.3.2) | http://hmmer.org/ |
27+
| MAFFT (v.7.475) | https://mafft.cbrc.jp/alignment/software/ |
28+
| RNAcode (v.0.3) | https://github.com/ViennaRNA/RNAcode |
29+
| BWA (v.0.7.17) | https://github.com/lh3/bwa |
30+
| BLAST (v.2.13.0) | https://blast.ncbi.nlm.nih.gov/Blast.cgi |
31+
| TMHMM (v.2.0) | https://services.healthtech.dtu.dk/services/TMHMM-2.0/ |
32+
| SignalP (v.5.0) | https://services.healthtech.dtu.dk/services/SignalP-5.0/ |
33+
34+
## Data Availability
35+
36+
### Database
37+
38+
Theese databases are used in the construction and analysis of the catalogue.
39+
40+
| **Database** | **Availability** |
41+
| :---: | :---: |
42+
| SPIRE | http://spire.embl.de |
43+
| ProGenomes v2 | http://progenomes2.embl.de/ |
44+
| AntiFam (v.7.0) | ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ |
45+
| PRIDE | https://www.ebi.ac.uk/pride/ |
46+
| RefSeq | https://ftp.ncbi.nlm.nih.gov/refseq/release/ |
47+
| The Conserved Domain Database | https://ftp.ncbi.nih.gov/pub/mmdb/cdd |
48+
| GTDB R95 | https://gtdb.ecogenomic.org/ |
49+
50+
### Preprocessed data
51+
52+
smORF catalogue & annotations: The smORF catalogue and its annotations are available at https://doi.org/10.5281/zenodo.7944370.
53+
54+
Preprocessed data: For convenience, the preprocessed files are available under the Preprocessed_Files folder.

0 commit comments

Comments
 (0)