Skip to content

Commit 92f19e4

Browse files
committed
ENH simplify readme
1 parent 41fbf4a commit 92f19e4

File tree

1 file changed

+45
-46
lines changed

1 file changed

+45
-46
lines changed

README.md

Lines changed: 45 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
GMSC-mapper is a command line tool to query the Global Microbial smORFs Catalog (GMSC).
44

55
GMSC-mapper can be used to
6-
- Find query smORFs (< 100aa) homologous to Global Microbial smORFs Catalog (GMSC) by alignment.
6+
- Find query smORFs (< 100aa) homologous to Global Microbial smORFs Catalogue (GMSC) by alignment.
77
- Support 3 types of input:
88
- contigs (GMSC-mapper will predict smORFs from contigs first)
99
- amino acid sequences
1010
- nucleotide gene sequences
11-
- Annotate query/predicted smORFs with quality, habitat and taxonomy information constructed manually in detail.
11+
- Annotate query / predicted smORFs with quality, habitat and taxonomy information constructed manually in detail.
1212

1313
## Installation
1414

@@ -44,10 +44,8 @@ cd GMSC-mapper
4444
python setup.py install
4545
```
4646

47-
### Example test
48-
Because the whole GMSC database is large, and takes some minutes to process.
49-
50-
If you want to check if the installation works well, you can test with mock datasets easily and fast.
47+
#### Example test
48+
As the whole GMSC database is large and takes some minutes to process. To check if the installation works well, you can test with mock datasets easily and fast.
5149

5250
Please make `GMSC-mapper` as your work directory.
5351

@@ -57,86 +55,77 @@ cd GMSC-mapper
5755

5856
- Create GMSC database index
5957

60-
Default alignment tool is Diamond.
58+
Default alignment tool is DIAMOND.
6159

6260
```bash
6361
gmsc-mapper createdb -i ./examples/target.faa -o ./examples/ -m diamond
6462
```
6563

66-
If you want to use MMseqs2 as your alignment tool, you need to create GMSC database index in MMseqs2 format.
67-
68-
```bash
69-
gmsc-mapper createdb -i ./examples/target.faa -o ./examples/ -m mmseqs
70-
```
71-
72-
- Input is genome contig sequences.
64+
- When input is genome contig sequences:
7365

7466
```bash
7567
gmsc-mapper -i ./examples/example.fa -o ./examples_output/ --dbdir ./examples/
7668
```
7769

78-
- Input is amino acid sequences.
70+
- When input is amino acid sequences:
7971

8072
```bash
8173
gmsc-mapper --aa-genes ./examples/example.faa -o ./examples_output/ --dbdir ./examples/
8274
```
8375

84-
- Input is nucleotide gene sequences.
76+
- When input is nucleotide gene sequences:
8577

8678
```bash
8779
gmsc-mapper --nt-genes ./examples/example.fna -o ./examples_output/ --dbdir ./examples/
8880
```
8981

9082
- Check another alignment tool: MMseqs2
9183

84+
The default alignment tool is DIAMOND, if you want to use MMseqs2 as your alignment tool, you need to create GMSC database index in MMseqs2 format.
85+
86+
```bash
87+
gmsc-mapper createdb -i ./examples/target.faa -o ./examples/ -m mmseqs
88+
```
89+
90+
After index creation, you can specify tool as mmseqs and other usage is the same as above.
91+
9292
```bash
9393
gmsc-mapper -i ./examples/example.fa -o ./examples_output/ --dbdir ./examples/ --tool mmseqs
9494
```
9595

9696
## Usage
9797

98-
### Download GMSC database
98+
### Default usage
9999

100-
We recommend to use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below.
100+
#### Download GMSC database and create index
101101

102-
`--dbdir`: Path to GMSC database annotation index files. (default: `./db`. If `GMSC-mapper` is your current work directory, the database files will be downloaded at `GMSC-mapper/db`)
102+
We recommend to use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below.
103103

104104
```bash
105105
cd GMSC-mapper
106106
```
107107

108+
Download GMSC database
109+
108110
```bash
109111
gmsc-mapper downloaddb --dbdir ./db
110112
```
111113

112-
Otherwise if you want to use custom `--dbdir` directory, it should be consistent with `-o` (Path to database index output of Diamond and MMseqs2) in the creating index step
113-
114-
### Create GMSC database index of Diamond/MMseqs2
115-
116-
We also recommend to use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below.
117-
118-
The input (`i`) is the fasta file (`GMSC10.90AA.faa.gz`) downloaded to the dbdir (default: `./db`. If `GMSC-mapper` is your current work directory, the dbdir is `GMSC-mapper/db`) in the downloading step.
119-
120-
`-o`: Path to database index output of Diamond and MMseqs2. (default: `./db`. If `GMSC-mapper` is your current work directory, the database files will be created at `GMSC-mapper/db`)
121-
122-
`-m`: Alignment tool (Diamond / MMseqs2).
114+
The default `--dbdir` is `./db`. If you want to use custom `--dbdir` directory, it should be consistent with `-o` in the next creating database index step.
123115

124-
```bash
125-
cd GMSC-mapper
126-
```
116+
Create GMSC database index
127117

128118
```bash
129119
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -o ./db -m diamond
130120
```
131-
or
132-
```bash
133-
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -o ./db -m mmseqs
134-
```
135121

136-
Otherwise if you want to use custom `-o` directory, it should be consistent with `--dbdir` (Path to GMSC database annotation index files) in the download step.
122+
The input (`i`) is the fasta file (`GMSC10.90AA.faa.gz`) downloaded to the dbdir (default: `./db`) in the downloading step.
123+
124+
The default `-o` is `./db`. If you want to use custom `-o` directory, it should be consistent with `--dbdir` in the last downloading database step.
125+
126+
#### GMSC Annotation
137127

138-
### Default
139-
GMSC Database directory (`--dbdir`) and output directory (`-o`) can be assigned on your own. Default is `./db` and `./output`. If `GMSC-mapper` is your current work directory, they will be `GMSC-mapper/db` and `GMSC-mapper/output`.
128+
GMSC Database directory (`--dbdir`) and output directory (`-o`) can be assigned on your own. Default is `./db` and `./output`.
140129

141130
If you use `GMSC-mapper` as your current work directory. You can derectly follow the commonds below. Otherwise, you need to assign your custom `--dbdir` which contains database files.
142131

@@ -162,18 +151,28 @@ gmsc-mapper --aa-genes ./examples/example.faa --dbdir ./db
162151
gmsc-mapper --nt-genes ./examples/example.fna --dbdir ./db
163152
```
164153

165-
### Alignment tool: Diamond / MMseqs2 is optional
166-
If you want to change alignment tool (Diamond / MMseqs2), you can use `--tool`.
154+
### Further usage
155+
156+
#### Habitat / taxonomy / quality / domain annotation is optional
157+
158+
If you don't want to annotate habitat / taxonomy / quality you can use `--no-habitat`/`--no-taxonomy`/`--no-quality` / `--no-domain`.
159+
160+
```bash
161+
gmsc-mapper -i ./examples/example.fa --dbdir ./db --no-habitat --no-taxonomy --no-quality --no-domain
162+
```
163+
164+
#### Alignment tool: DIAMOND / MMseqs2 is optional
165+
166+
The default alignment tool is DIAMOND, if you want to use MMseqs2 as your alignment tool, you need to create GMSC database index in MMseqs2 format.
167167

168168
```bash
169-
gmsc-mapper -i ./examples/example.fa --dbdir ./db --tool mmseqs
169+
gmsc-mapper createdb -i ./db/GMSC10.90AA.faa.gz -o ./db -m mmseqs
170170
```
171171

172-
### Habitat / taxonomy / quality / domain annotation is optional
173-
If you don't want to annotate habitat / taxonomy / quality / domain you can use `--no-habitat`/`--no-taxonomy`/`--no-quality`/`--no-domain`.
172+
Then you can assign`--tool` as mmseqs.
174173

175174
```bash
176-
gmsc-mapper -i ./examples/example.fa --dbdir ./db --no-habitat --no-taxonomy --no-quality --no-domain
175+
gmsc-mapper -i ./examples/example.fa --dbdir ./db --tool mmseqs
177176
```
178177

179178
## Output files

0 commit comments

Comments
 (0)