Skip to content

Commit 037def4

Browse files
committed
ENH use --dbdir to run outsite the main dir
1 parent bb9dd09 commit 037def4

14 files changed

+164
-116
lines changed

README.md

Lines changed: 58 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ If you want to check if the installation works well, you can test with mock data
5151

5252
Please make `GMSC-mapper` as your work directory.
5353

54+
```bash
55+
cd GMSC-mapper
56+
```
57+
5458
- Create GMSC database index
5559

5660
Default alignment tool is Diamond.
@@ -68,87 +72,108 @@ gmsc-mapper createdb -i ./examples/target.faa -o ./examples/ -m mmseqs
6872
- Input is genome contig sequences.
6973

7074
```bash
71-
gmsc-mapper -i ./examples/example.fa -o ./examples_output/ --db ./examples/targetdb.dmnd --habitat ./examples/ref_habitat.npy --habitat-index ./examples/ref_habitat_index.tsv --quality ./examples/ref_quality.txt --taxonomy ./examples/ref_taxonomy.npy --taxonomy-index ./examples/ref_taxonomy_index.tsv --domain ./examples/ref_domain.txt
75+
gmsc-mapper -i ./examples/example.fa -o ./examples_output/ --dbdir ./examples/
7276
```
7377

7478
- Input is amino acid sequences.
7579

7680
```bash
77-
gmsc-mapper --aa-genes ./examples/example.faa -o ./examples_output/ --db ./examples/targetdb.dmnd --habitat ./examples/ref_habitat.npy --habitat-index ./examples/ref_habitat_index.tsv --quality ./examples/ref_quality.txt --taxonomy ./examples/ref_taxonomy.npy --taxonomy-index ./examples/ref_taxonomy_index.tsv --domain ./examples/ref_domain.txt
81+
gmsc-mapper --aa-genes ./examples/example.faa -o ./examples_output/ --dbdir ./examples/
7882
```
7983

8084
- Input is nucleotide gene sequences.
8185

8286
```bash
83-
gmsc-mapper --nt-genes ./examples/example.fna -o ./examples_output/ --db ./examples/targetdb.dmnd --habitat ./examples/ref_habitat.npy --habitat-index ./examples/ref_habitat_index.tsv --quality ./examples/ref_quality.txt --taxonomy ./examples/ref_taxonomy.npy --taxonomy-index ./examples/ref_taxonomy_index.tsv --domain ./examples/ref_domain.txt
87+
gmsc-mapper --nt-genes ./examples/example.fna -o ./examples_output/ --dbdir ./examples/
8488
```
8589

8690
- Check another alignment tool: MMseqs2
8791

8892
```bash
89-
gmsc-mapper -i ./examples/example.fa -o ./examples_output/ --db ./examples/targetdb --habitat ./examples/ref_habitat.npy --habitat-index ./examples/ref_habitat_index.tsv --quality ./examples/ref_quality.txt --taxonomy ./examples/ref_taxonomy.npy --taxonomy-index ./examples/ref_taxonomy_index.tsv --domain ./examples/ref_domain.txt --tool mmseqs
93+
gmsc-mapper -i ./examples/example.fa -o ./examples_output/ --dbdir ./examples/ --tool mmseqs
9094
```
9195

9296
## Usage
93-
Please make `GMSC-mapper` as your work directory.
9497

9598
### Download GMSC database
96-
`--dbdir`: Path to database output directory.(default: `GMSC-mapper/db`)
9799

98-
`--all`: Download all database
100+
We recommend to use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below.
99101

100-
`-f`: Force download even if the files exist
102+
`--dbdir`: Path to GMSC database annotation index files. (default: `./db`. If `GMSC-mapper` is your current work directory, the database files will be downloaded at `GMSC-mapper/db`)
101103

104+
```bash
105+
cd GMSC-mapper
102106
```
103-
gmsc-mapper downloaddb
107+
108+
```bash
109+
gmsc-mapper downloaddb --dbdir ./db
104110
```
105111

112+
Otherwise if you want to use custom `--dbdir` directory, it should be consistent with `-o` (Path to database index output of Diamond and MMseqs2) in the creating index step
113+
106114
### Create GMSC database index of Diamond/MMseqs2
107-
`-o`: Path to database output directory.(default: `GMSC-mapper/db`)
115+
116+
We also recommend to use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below.
117+
118+
The input (`i`) is the fasta file (`GMSC10.90AA.faa.gz`) downloaded to the dbdir (default: `./db`. If `GMSC-mapper` is your current work directory, the dbdir is `GMSC-mapper/db`) in the downloading step.
119+
120+
`-o`: Path to database index output of Diamond and MMseqs2. (default: `./db`. If `GMSC-mapper` is your current work directory, the database files will be created at `GMSC-mapper/db`)
108121

109122
`-m`: Alignment tool (Diamond / MMseqs2).
110123

124+
```bash
125+
cd GMSC-mapper
111126
```
112-
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -m diamond
127+
128+
```bash
129+
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -o ./db -m diamond
113130
```
114131
or
115-
```
116-
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -m mmseqs
132+
```bash
133+
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -o ./db -m mmseqs
117134
```
118135

136+
Otherwise if you want to use custom `-o` directory, it should be consistent with `--dbdir` (Path to GMSC database annotation index files) in the download step.
137+
119138
### Default
120-
GMSC database / habitat / taxonomy / quality / domain file path and output directory path can be assigned on your own.Default is `GMSC-mapper/db` and `GMSC-mapper/output`.
139+
GMSC Database directory (`--dbdir`) and output directory (`-o`) can be assigned on your own. Default is `./db` and `./output`. If `GMSC-mapper` is your current work directory, they will be `GMSC-mapper/db` and `GMSC-mapper/output`.
140+
141+
If you use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below. Otherwise, you need to assign your custom `--dbdir` which contains database files.
142+
143+
```bash
144+
cd GMSC-mapper
145+
```
121146

122147
1. Input is genome contig sequences.
123148

124149
```bash
125-
gmsc-mapper -i ./examples/example.fa
150+
gmsc-mapper -i ./examples/example.fa --dbdir ./db
126151
```
127152

128153
2. Input is amino acid sequences.
129154

130155
```bash
131-
gmsc-mapper --aa-genes ./examples/example.faa
156+
gmsc-mapper --aa-genes ./examples/example.faa --dbdir ./db
132157
```
133158

134159
3. Input is nucleotide gene sequences.
135160

136161
```bash
137-
gmsc-mapper --nt-genes ./examples/example.fna
162+
gmsc-mapper --nt-genes ./examples/example.fna --dbdir ./db
138163
```
139164

140165
### Alignment tool: Diamond / MMseqs2 is optional
141166
If you want to change alignment tool (Diamond / MMseqs2), you can use `--tool`.
142167

143168
```bash
144-
gmsc-mapper -i ./examples/example.fa --tool mmseqs
169+
gmsc-mapper -i ./examples/example.fa --dbdir ./db --tool mmseqs
145170
```
146171

147172
### Habitat / taxonomy / quality / domain annotation is optional
148173
If you don't want to annotate habitat / taxonomy / quality / domain you can use `--no-habitat`/`--no-taxonomy`/`--no-quality`/`--no-domain`.
149174

150175
```bash
151-
gmsc-mapper -i ./examples/example.fa --no-habitat --no-taxonomy --no-quality --no-domain
176+
gmsc-mapper -i ./examples/example.fa --dbdir ./db --no-habitat --no-taxonomy --no-quality --no-domain
152177
```
153178

154179
## Output files
@@ -247,7 +272,9 @@ The output folder will contain
247272

248273
* `--nt-genes`: Path to the input nucleotide gene sequence FASTA file (possibly .gz compressed).
249274

250-
* `-o/--output`: Output directory (will be created if non-existent). (default: ../output)
275+
* `--dbdir`: Path to the GMSC database directory. (default: `./db`)
276+
277+
* `-o/--output`: Output directory (will be created if non-existent). (default: `./output`)
251278

252279
* `--tool`: Sequence alignment tool (Diamond / MMseqs). (default: diamond)
253280

@@ -274,11 +301,21 @@ The output folder will contain
274301
* `--quiet`: Disable alignment console output. (default:False)
275302

276303
### Subcommands and Parameters
304+
#### Download GMSC database annotation index files
305+
Subcommands: `gmsc-mapper downloaddb`
306+
307+
* `--dbdir`: Path to GMSC database annotation index files. (default: `./db`. If `GMSC-mapper` is your current work directory, the database files will be downloaded at `GMSC-mapper/db`)
308+
309+
* `--all`: Download all database
310+
311+
* `-f`: Force download even if the files exist
312+
313+
#### Create database index of Diamond and mmseqs
277314
Subcommands: `gmsc-mapper createdb`
278315

279316
* `-i`: Path to the GMSC FASTA file.
280317

281-
* `-o/--output`: Path to database output directory. (default: ../db)
318+
* `-o/--output`: Path to database index output of Diamond and MMseqs2. (default: `./db`. If `GMSC-mapper` is your current work directory, the database files will be created at `GMSC-mapper/db`)
282319

283320
* `-m/--mode`: Alignment tool (Diamond / MMseqs2).
284321

examples/GMSC10.90AA.cdd.tsv.xz

176 Bytes
Binary file not shown.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
0 air
2+
1 annelidae associated
3+
2 anthropogenic
4+
3 built environment
5+
4 built environment,human skin
6+
5 chicken gut
7+
6 coral associated,marine
8+
7 human gut
9+
8 human gut,isolate
10+
9 human skin
11+
10 isolate
12+
11 lake associated
13+
12 lake associated,river associated
14+
13 lake associated,water associated
15+
14 marine
16+
15 marine,isolate
17+
16 marine,wastewater,water associated
18+
17 marine,water associated
19+
18 plant associated
20+
19 river associated
21+
20 soil
22+
21 termite gut
23+
22 wastewater
24+
23 water associated

examples/GMSC10.90AA.habitat.npy

484 Bytes
Binary file not shown.
164 Bytes
Binary file not shown.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
0 Unknown
2+
1 d__Archaea;p__Thermoproteota;c__Nitrososphaeria;o__Nitrososphaerales;f__Nitrososphaeraceae;g__Nitrososphaera;s__Nitrososphaera sp002494895
3+
2 d__Archaea;p__Thermoproteota;c__Nitrososphaeria;o__Nitrososphaerales;f__Nitrososphaeraceae;g__UBA10452;s__UBA10452 sp003176995
4+
3 d__Bacteria
5+
4 d__Bacteria;p__Actinobacteriota;c__Actinomycetia
6+
5 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Actinomycetales;f__Microbacteriaceae;g__Microbacterium;s__Microbacterium sp003476465
7+
6 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Streptosporangiales;f__Streptosporangiaceae
8+
7 d__Bacteria;p__Actinobacteriota;c__Actinomycetia;o__Streptosporangiales;f__Streptosporangiaceae;g__UBA9676;s__UBA9676 sp003541285
9+
8 d__Bacteria;p__Actinobacteriota;c__Thermoleophilia;o__Solirubrobacterales;f__Solirubrobacteraceae;g__Solirubrobacter
10+
9 d__Bacteria;p__Cyanobacteria;c__Cyanobacteriia
11+
10 d__Bacteria;p__Deinococcota;c__Deinococci;o__Deinococcales
12+
11 d__Bacteria;p__Firmicutes_A
13+
12 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Enterocloster;s__Enterocloster sp900551225
14+
13 d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales
15+
14 d__Bacteria;p__Firmicutes_A;c__Clostridia_A;o__Christensenellales;f__UBA1242;g__UBA6345;s__UBA6345 sp002437945
16+
15 d__Bacteria;p__Firmicutes_A;c__Clostridia_A;o__Christensenellales;f__UBA1242;g__UMGS687;s__UMGS687 sp900545735
17+
16 d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Burkholderiales;f__Burkholderiaceae;g__SYFN01;s__SYFN01 sp005800405
18+
17 d__Bacteria;p__SAR324;c__SAR324;o__SAR324;f__NAC60-12;g__Arctic96AD-7;s__Arctic96AD-7 sp002685535
19+
18 d__Bacteria;p__Verrucomicrobiota;c__Verrucomicrobiae;o__Pedosphaerales;f__Pedosphaeraceae;g__UBA11358

examples/GMSC10.90AA.taxonomy.npy

484 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)