You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This directory contains the scripts to run the analysis and generate figures included in the GMSC manuscript. Each figure is associated with a Jupyter notebook
4
+
5
+
## Installation
6
+
7
+
The following command will create a conda environment with the necessary dependencies:
Copy file name to clipboardExpand all lines: README.md
+15-12Lines changed: 15 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,21 @@
1
1
# A catalogue of small proteins from the global microbiome
2
-
This repository contains files and scripts to generate analysis and figures in the manuscript "A catalogue of small proteins from the global microbiome":
2
+
3
+
This repository contains files and scripts to generate analysis and figures in the manuscript _A catalogue of small proteins from the global microbiome_:
4
+
3
5
> Yiqian Duan, Celio Dias Santos-Junior, Thomas Sebastian Schmidt, Anthony Fullam, Breno L. S. de Almeida, Chengkai Zhu, Kuhn Michael, Xing-Ming Zhao, Peer Bork, Luis Pedro Coelho
The global microbial smORFs catalogue (GMSC) is available at https://gmsc.big-data-biology.org
8
+
The results Global Microbial SMORFs Catalogue (GMSC) is available at https://gmsc.big-data-biology.org
7
9
8
-
## Introduction
10
+
## Structure
9
11
10
-
The **General_Scripts**folder contains scripts to generate the GMSC resource.
12
+
The folder `Resource_Generation`contains scripts to generate the GMSC resource from raw data. Note that this requires downloading all the contigs from [SPIRE](https://spire.embl.de) and [ProGenomes v2](https://progenomes2.embl.de/) as well as very large computational resources. The scripts are provided for transparency and reproducibility, but we recommend using the precomputed data (deposited at Zenodo or included here, see below) and the GMSC resource at https://gmsc.big-data-biology.org.
11
13
12
-
The **Manuscript_Analysis**folder contains pre-computed files and scripts to run the analysis and generate figures included in the GMSC manuscript.
14
+
The folder `Manuscript_Analysis`contains pre-computed files and scripts to run the analysis and generate figures included in the GMSC manuscript. The scripts are written in Python (depenencies listed below). Generally speaking, these do not require large computational resources and interested users can run them on their own machines and adapt them to perform follow-up analyses.
13
15
14
16
## Dependencies
15
17
16
-
The softwares are required for the scripts.
18
+
The following are required for the scripts (other versions may work, we list the ones that were used).
17
19
18
20
|**Software**|**Availability**|
19
21
| :---: | :---: |
@@ -24,7 +26,7 @@ The softwares are required for the scripts.
0 commit comments