Skip to content

Commit 2f5d6c8

Browse files
committed
feat: add pgap annotation; closes #2
1 parent 5125d55 commit 2f5d6c8

File tree

20 files changed

+23525
-18
lines changed

20 files changed

+23525
-18
lines changed

.github/workflows/main.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,11 @@ jobs:
4747
with:
4848
directory: .test
4949
snakefile: workflow/Snakefile
50-
args: "--sdm conda --show-failed-logs --cores 3 --conda-cleanup-pkgs cache -n"
50+
args: "--sdm conda --show-failed-logs --cores 1 --conda-cleanup-pkgs cache -n"
5151

5252
- name: Test report
5353
uses: snakemake/[email protected]
5454
with:
5555
directory: .test
5656
snakefile: workflow/Snakefile
57-
args: "--cores 3 --report report.zip -n"
57+
args: "--cores 1 --report report.zip -n"

.test/config/config.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
1-
samplesheet: "config/samples.tsv"
1+
samplesheet: "config/samples.csv"
2+
outdir: "results"
3+
4+
pgap:
5+
bin: "path/to/pgap.py"
6+
use_yaml_config: True
7+
prepare_yaml_files:
8+
generic: "config/generic.yaml"
9+
submol: "config/submol.yaml"

.test/config/generic.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
fasta:
2+
class: File
3+
location: None
4+
submol:
5+
class: File
6+
location: None

.test/config/samples.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
sample,species,strain,id_prefix,file
2+
EC2224,"Streptococcus pyogenes",SF370,SPY,"data/assembly.fasta"

.test/config/submol.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
topology: "circular"
2+
location: "chromosome"
3+
organism:
4+
genus_species: ""
5+
strain: ""
6+
contact_info:
7+
last_name: ""
8+
first_name: ""
9+
email: ""
10+
organization: ""
11+
department: ""
12+
street: ""
13+
city: ""
14+
state: ""
15+
postal_code: ""
16+
country: ""
17+
authors:
18+
- author:
19+
last_name: "last_name"
20+
first_name: "first_name"
21+
locus_tag_prefix: ""

.test/data/assembly.fasta

Lines changed: 23164 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,42 @@ The usage of this workflow is described in the [Snakemake Workflow Catalog](http
1313

1414
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository.
1515

16+
## Workflow overview
17+
18+
1. Parse `samples.csv` table containing the samples's meta data (`python`)
19+
2. Annotate assemblies using NCBI's Prokaryotic Genome Annotation Pipeline ([PGAP](https://github.com/ncbi/pgap))
20+
21+
## Requirements
22+
23+
- [PGAP](https://github.com/ncbi/pgap)
24+
25+
## Installation
26+
27+
**Step 1: Clone this repository**
28+
29+
```bash
30+
git clone https://github.com/MPUSP/snakemake-assembly-postprocessing.git
31+
cd snakemake-assembly-postprocessing
32+
```
33+
34+
**Step 2: Install dependencies**
35+
36+
It is recommended to install snakemake and run the workflow with `conda` or `mamba`. [Miniforge](https://conda-forge.org/download/) is the preferred conda-forge installer and includes `conda`, `mamba` and their dependencies.
37+
38+
**Step 3: Create snakemake environment**
39+
40+
This step creates a new conda environment called `snakemake-assembly-postprocessing`.
41+
42+
```bash
43+
mamba create -c conda-forge -c bioconda -n snakemake-assembly-postprocessing snakemake pandas
44+
conda activate snakemake-assembly-postprocessing
45+
```
46+
47+
**Step 4: Install PGAP**
48+
49+
- PGAP can be downloaded from https://github.com/ncbi/pgap. Please follow the installation instructions there.
50+
- Define the path to the `pgap.py` script (located in the `scripts` folder) in the `config` file (recommended: `./resources`)
51+
1652
## Authors
1753

1854
- Dr. Rina Ahmed-Begrich
@@ -25,4 +61,6 @@ If you use this workflow in a paper, don't forget to give credits to the authors
2561

2662
## References
2763

64+
> Li W, O'Neill KR, Haft DH, DiCuccio M, Chetvernin V, Badretdin A, Coulouris G, Chitsaz F, Derbyshire MK, Durkin AS, Gonzales NR, Gwadz M, Lanczycki CJ, Song JS, Thanki N, Wang J, Yamashita RA, Yang M, Zheng C, Marchler-Bauer A, Thibaud-Nissen F. _RefSeq: Expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation._ Nucleic Acids Res, **2021** Jan 8;49(D1):D1020-D1028. https://doi.org/10.1093/nar/gkaa1105
65+
2866
> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. _Sustainable data analysis with Snakemake_. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.

config/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
This workflow requires `fasta` input data.
66
The samplesheet table has the following layout:
77

8-
| sample | species | strain | id_prefix | file |
9-
| ----------- | ------------ | ---------------------------------- | ------------- | ------------- |
8+
| sample | species | strain | id_prefix | file |
9+
| ----------- | ------------ | ------------- | ------------- | ------------- |
1010
| EC2224 | "Streptococcus pyogenes" | SF370 | Spy | assembly.fasta |
1111

1212
### Execution
@@ -22,11 +22,11 @@ Adjust options in the default config file `config/config.yml`.
2222
Before running the entire workflow, perform a dry run using:
2323

2424
```bash
25-
snakemake --cores 3 --sdm conda --directory .test --dry-run
25+
snakemake --cores 1 --sdm conda --directory .test --dry-run
2626
```
2727

2828
To run the workflow with test files using **conda**:
2929

3030
```bash
31-
snakemake --cores 3 --sdm conda --directory .test
31+
snakemake --cores 1 --sdm conda --directory .test
3232
```

config/config.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,9 @@
1-
samplesheet: "config/samples.tsv"
1+
samplesheet: "config/samples.csv"
2+
outdir: "results"
3+
4+
pgap:
5+
bin: "path/to/pgap.py"
6+
use_yaml_config: True
7+
prepare_yaml_files:
8+
generic: "config/generic.yaml"
9+
submol: "config/submol.yaml"

config/generic.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
fasta:
2+
class: File
3+
location: None
4+
submol:
5+
class: File
6+
location: None

0 commit comments

Comments
 (0)