Skip to content

Commit 290ae26

Browse files
authored
Better README (#21)
* RFCT+DOC General_Scripts -> Resource_Generation Also, improve the text of the README file * DOC Add basic README in Manuscript_analysis
1 parent f335dd3 commit 290ae26

File tree

653 files changed

+44
-16
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

653 files changed

+44
-16
lines changed

Manuscript_analysis/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Manuscript analysis
2+
3+
This directory contains the scripts to run the analysis and generate figures included in the GMSC manuscript. Each figure is associated with a Jupyter notebook
4+
5+
## Installation
6+
7+
The following command will create a conda environment with the necessary dependencies:
8+
9+
10+
```bash
11+
conda create \
12+
-n gmsc_env \
13+
python \
14+
jupyter \
15+
numpy \
16+
pandas \
17+
matplotlib \
18+
geopandas \
19+
scipy \
20+
statsmodels \
21+
biopython \
22+
seaborn
23+
```
24+

README.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
11
# A catalogue of small proteins from the global microbiome
2-
This repository contains files and scripts to generate analysis and figures in the manuscript "A catalogue of small proteins from the global microbiome":
2+
3+
This repository contains files and scripts to generate analysis and figures in the manuscript _A catalogue of small proteins from the global microbiome_:
4+
35
> Yiqian Duan, Celio Dias Santos-Junior, Thomas Sebastian Schmidt, Anthony Fullam, Breno L. S. de Almeida, Chengkai Zhu, Kuhn Michael, Xing-Ming Zhao, Peer Bork, Luis Pedro Coelho
4-
bioRxiv 2023.12.27.573469; doi: https://doi.org/10.1101/2023.12.27.573469
6+
> bioRxiv 2023.12.27.573469; doi: https://doi.org/10.1101/2023.12.27.573469
57
6-
The global microbial smORFs catalogue (GMSC) is available at https://gmsc.big-data-biology.org
8+
The results Global Microbial SMORFs Catalogue (GMSC) is available at https://gmsc.big-data-biology.org
79

8-
## Introduction
10+
## Structure
911

10-
The **General_Scripts** folder contains scripts to generate the GMSC resource.
12+
The folder `Resource_Generation` contains scripts to generate the GMSC resource from raw data. Note that this requires downloading all the contigs from [SPIRE](https://spire.embl.de) and [ProGenomes v2](https://progenomes2.embl.de/) as well as very large computational resources. The scripts are provided for transparency and reproducibility, but we recommend using the precomputed data (deposited at Zenodo or included here, see below) and the GMSC resource at https://gmsc.big-data-biology.org.
1113

12-
The **Manuscript_Analysis** folder contains pre-computed files and scripts to run the analysis and generate figures included in the GMSC manuscript.
14+
The folder `Manuscript_Analysis` contains pre-computed files and scripts to run the analysis and generate figures included in the GMSC manuscript. The scripts are written in Python (depenencies listed below). Generally speaking, these do not require large computational resources and interested users can run them on their own machines and adapt them to perform follow-up analyses.
1315

1416
## Dependencies
1517

16-
The softwares are required for the scripts.
18+
The following are required for the scripts (other versions may work, we list the ones that were used).
1719

1820
| **Software** | **Availability** |
1921
| :---: | :---: |
@@ -24,7 +26,7 @@ The softwares are required for the scripts.
2426
| MMseqs2 | https://github.com/soedinglab/MMseqs2 |
2527
| Swipe (v.2.1.1) | https://github.com/torognes/swipe |
2628
| DIAMOND (v.2.0.4) | https://github.com/bbuchfink/diamond |
27-
| HMMer (v.3.3.2) | http://hmmer.org/ |
29+
| HMMer (v.3.3.2) | https://hmmer.org/ |
2830
| MAFFT (v.7.475) | https://mafft.cbrc.jp/alignment/software/ |
2931
| RNAcode (v.0.3) | https://github.com/ViennaRNA/RNAcode |
3032
| BWA (v.0.7.17) | https://github.com/lh3/bwa |
@@ -40,8 +42,8 @@ These databases are used in the construction and analysis of the catalogue.
4042

4143
| **Database** | **Availability** |
4244
| :---: | :---: |
43-
| SPIRE | http://spire.embl.de |
44-
| ProGenomes v2 | http://progenomes2.embl.de/ |
45+
| SPIRE | https://spire.embl.de |
46+
| ProGenomes v2 | https://progenomes2.embl.de/ |
4547
| AntiFam (v.7.0) | ftp://ftp.ebi.ac.uk/pub/databases/Pfam/AntiFam/ |
4648
| PRIDE | https://www.ebi.ac.uk/pride/ |
4749
| RefSeq | https://ftp.ncbi.nlm.nih.gov/refseq/release/ |
@@ -50,6 +52,7 @@ These databases are used in the construction and analysis of the catalogue.
5052

5153
### Preprocessed data
5254

53-
smORF catalogue & annotations: The smORF catalogue and its annotations are available at https://doi.org/10.5281/zenodo.7944370.
55+
smORF catalogue & annotations: The smORF catalogue and its annotations are available at https://doi.org/10.5281/zenodo.7944370
56+
57+
Preprocessed data: For convenience, the preprocessed files are available under the `Manuscript_anlysis/data` folder.
5458

55-
Preprocessed data: For convenience, the preprocessed files are available under the `Manuscript_anlysis/data` folder.

0 commit comments

Comments
 (0)