Skip to content

Commit 25b8487

Browse files
authored
Merge pull request #18 from jaebeom-kim/main
Improve discriptions for fields and visuals
2 parents a6204de + f7fbf6c commit 25b8487

File tree

13 files changed

+999
-203
lines changed

13 files changed

+999
-203
lines changed

Makefile

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,10 @@ LIPO ?= lipo
1212
endif
1313

1414
win: resources/win/x64/${FRONTEND_APP}.bat
15-
mac: resources/mac/x64/${FRONTEND_APP} resources/mac/arm64/${FRONTEND_APP}
15+
mac: resources/mac/x64/${FRONTEND_APP} resources/mac/arm64/${FRONTEND_APP} resources/mac/x64/fastp resources/mac/arm64/fastp resources/mac/x64/fastplong resources/mac/arm64/fastplong
1616
linux: resources/linux/arm64/${FRONTEND_APP} resources/linux/x64/${FRONTEND_APP}
1717

18+
# macOS
1819
resources/mac/${FRONTEND_APP}:
1920
mkdir -p resources/mac
2021
wget -nv -q -O - https://mmseqs.com/metabuli/metabuli-osx-universal.tar.gz | tar -xOf - ${FRONTEND_APP}/bin/${FRONTEND_APP} > resources/mac/${FRONTEND_APP}
@@ -28,6 +29,34 @@ resources/mac/arm64/${FRONTEND_APP}: resources/mac/${FRONTEND_APP}
2829
mkdir -p resources/mac/arm64
2930
$(LIPO) resources/mac/${FRONTEND_APP} -thin arm64 -output resources/mac/arm64/${FRONTEND_APP} || cp -f -- resources/mac/${FRONTEND_APP} resources/mac/arm64/${FRONTEND_APP}
3031

32+
resources/mac/fastp:
33+
mkdir -p resources/mac
34+
wget -nv -q -O - https://github.com/jaebeom-kim/fastp/releases/download/v0.0.1/fastp-osx-universal.gz | gunzip > resources/mac/fastp
35+
chmod +x resources/mac/fastp
36+
37+
resources/mac/x64/fastp: resources/mac/fastp
38+
mkdir -p resources/mac/x64
39+
$(LIPO) resources/mac/fastp -remove arm64 -output resources/mac/x64/fastp || cp -f -- resources/mac/fastp resources/mac/x64/fastp
40+
41+
resources/mac/arm64/fastp: resources/mac/fastp
42+
mkdir -p resources/mac/arm64
43+
$(LIPO) resources/mac/fastp -thin arm64 -output resources/mac/arm64/fastp || cp -f -- resources/mac/fastp resources/mac/arm64/fastp
44+
45+
resources/mac/fastplong:
46+
mkdir -p resources/mac
47+
wget -nv -q -O - https://github.com/jaebeom-kim/fastplong/releases/download/v0.0.1/fastplong-osx-universal.gz | gunzip > resources/mac/fastplong
48+
chmod +x resources/mac/fastplong
49+
50+
resources/mac/x64/fastplong: resources/mac/fastplong
51+
mkdir -p resources/mac/x64
52+
$(LIPO) resources/mac/fastplong -remove arm64 -output resources/mac/x64/fastplong || cp -f -- resources/mac/fastplong resources/mac/x64/fastplong
53+
54+
resources/mac/arm64/fastplong: resources/mac/fastplong
55+
mkdir -p resources/mac/arm64
56+
$(LIPO) resources/mac/fastplong -thin arm64 -output resources/mac/arm64/fastplong || cp -f -- resources/mac/fastplong resources/mac/arm64/fastplong
57+
58+
59+
# Linux
3160
resources/linux/x64/${FRONTEND_APP}-sse2:
3261
mkdir -p resources/linux/x64
3362
wget -nv -q -O - https://mmseqs.com/metabuli/metabuli-linux-sse2.tar.gz | tar -xOf - ${FRONTEND_APP}/bin/${FRONTEND_APP} > resources/linux/x64/${FRONTEND_APP}-sse2
@@ -47,11 +76,21 @@ resources/linux/arm64/${FRONTEND_APP}:
4776
wget -nv -q -O - https://mmseqs.com/metabuli/metabuli-linux-arm64.tar.gz | tar -xOf - ${FRONTEND_APP}/bin/${FRONTEND_APP} > resources/linux/arm64/${FRONTEND_APP}
4877
chmod +x resources/linux/arm64/${FRONTEND_APP}
4978

79+
resources/linux/x64/fastp:
80+
mkdir -p resources/linux/x64
81+
wget http://opengene.org/fastp/fastp && mv fastp resources/linux/x64/fastp && chmod a+x resources/linux/x64/fastp
82+
83+
84+
5085
resources/win/x64/${FRONTEND_APP}.bat:
5186
mkdir -p resources/win/x64
5287
cd resources/win/x64 && wget -nv -O ${FRONTEND_APP}-win64.zip https://mmseqs.com/metabuli/metabuli-win64.zip \
5388
&& unzip ${FRONTEND_APP}-win64.zip && mv ${FRONTEND_APP}/* . && rmdir ${FRONTEND_APP} && rm ${FRONTEND_APP}-win64.zip
5489
chmod -R +x resources/win/x64/${FRONTEND_APP}.bat resources/win/x64/bin/*
5590

91+
resources/win/x64/fastp:
92+
mkdir -p resources/win/x64
93+
wget https://github.com/jaebeom-kim/fastp/releases/download/v0.0.1/fastp-windows.exe && mv fastp-windows.exe resources/win/x64/fastp && chmod a+x resources/win/x64/fastp
94+
5695
clean:
5796
@rm -rf resources/mac/* resources/linux/* resources/win/*

README.md

Lines changed: 147 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,19 +72,136 @@ We will make a button for GTDB soon.
7272
7373
---
7474

75+
# Raw Read Quality Control
76+
> You can preprocess raw reads either in the separate `Quality Control` tab or in the `Search Settings` tab as part of the classification process.
77+
78+
Metabuli App supports `fastp` and `fastplong` for raw read quality control, respectively for short and long reads.
79+
You can upload one or more (gzipped) FASTQ files for quality control.
80+
81+
For each sample, `fastp`/`fastplong` will generate the following files:
82+
- Pre-processed FASTQ files
83+
- Quality control and filtering report files in HTML format
84+
- JSON format report files for further analysis
85+
86+
## Parameter settings for short read QC using fastp
87+
Default settings are generally suitable for most datasets, but you can adjust them as needed.
88+
Below are the parameters adjustable in the GUI. Other parameters can be provided as a text file (Please see "Advanced Settings" below).
89+
For more details, please refer [fastp GitHub repository](https://github.com/OpenGene/fastp).
90+
91+
### Quality Filtering (Enabled by default)
92+
- `--disable_quality_filtering`: Disable quality filtering.
93+
- `--qualified_quality_phred`: Minimum per-base Phred quality score (default 15).
94+
- `--unqualified_percent_limit`: Maximum fraction of "low-quality" bases allowed (default 40%).
95+
- `--average_qual`: Minimum average quality score for the read (default none).
96+
97+
### Length Filtering (Enabled by default)
98+
- `--disable_length_filtering`: Disable length filtering.
99+
- `--length_required`: Minimum read length required (default 50). Reads shorter than this are discarded.
100+
- `--length_limit`: Maximum read length allowed (default none). Reads longer than this are discarded.
101+
102+
### Adapter trimming (Enabled by default)
103+
- Adapter sequences are automatically detected if not specified.
104+
- `--disable_adapter_trimming`: Disable adapter trimming.
105+
- `--adapter_sequence`: Adapter for read 1. It disables auto-detection for SE reads.
106+
- `--adapter_sequence_r2`: Adapter for read 2 (for PE data). For PE data, the specified adapter sequences are used only when auto-detection fails.
107+
- `--adapter_fasta`: FASTA file of adapter sequences. They are used after trimming adapters that are either auto-detected or specified with `--adapter_sequence` or `--adapter_sequence_r2`.
108+
109+
### Low complexity filtering (*Disabled* by default)
110+
- `--low_complexity_filter`: Enable low complexity filtering.
111+
- `--complexity_threshold`: Reads with complexity below this value are discarded. Range: 0~100. (default 30)
112+
113+
### Per read cutting by quality (*Disabled* by default)
114+
- `--cut_front`: Enable cutting reads from the front (5') based on quality.
115+
- `--cut_front_window_size`: Size of the window for cutting from the front (default 4).
116+
- `--cut_front_mean_quality`: Minimum mean quality for the front window (default 20).
117+
- `--cut_tail`: Enable cutting reads from the tail (3') based on quality.
118+
- `--cut_tail_window_size`: Size of the window for cutting from the tail (default 4).
119+
- `--cut_tail_mean_quality`: Minimum mean quality for the tail window (default 20).
120+
121+
### Other Parameters
122+
- `--thread`: Number of threads to use (default max(all, 16)).
123+
- `--compression`: Output compression level (default 4).
124+
125+
126+
## Parameter settings for long read QC using fastplong
127+
Default settings are generally suitable for most datasets, but you can adjust them as needed.
128+
Below are the parameters adjustable in the GUI. Other parameters can be provided as a text file (Please see "Advanced Settings" below).
129+
For more details, please refer [fastplong GitHub repository](https://github.com/OpenGene/fastplong).
130+
131+
### Quality Filtering (Enabled by default)
132+
- `--disable_quality_filtering`: Disable quality filtering.
133+
- `--qualified_quality_phred`: Minimum per-base Phred quality score (default 15).
134+
- `--unqualified_percent_limit`: Maximum fraction of "low-quality" bases allowed (default 40%).
135+
- `--mean_qual`: Minimum average quality score for the read (default none).
136+
137+
### Length Filtering (Enabled by default)
138+
- `--disable_length_filtering`: Disable length filtering.
139+
- `--length_required`: Minimum read length required (default 1000). Reads shorter than this are discarded.
140+
- `--length_limit`: Maximum read length allowed (default none). Reads longer than this are discarded.
141+
142+
### Adapter trimming (Enabled by default)
143+
- Adapter sequences are automatically detected if not specified.
144+
- It's recommended to specify adapters if they are known using `--start_adapter` and `--end_adapter`.
145+
- `--disable_adapter_trimming`: Disable adapter trimming.
146+
- `--start_adapter`: Read start adapter sequence.
147+
- `--end_adapter`: Read end adapter sequence.
148+
- `--adapter_fasta`: FASTA file of adapter sequences.
149+
150+
### Low complexity filtering (*Disabled* by default)
151+
- `--low_complexity_filter`: Enable low complexity filtering.
152+
- `--complexity_threshold`: Reads with complexity below this value are discarded. Range: 0~100. (default 30)
153+
154+
### Per read cutting by quality (*Disabled* by default)
155+
- `--cut_front`: Enable cutting reads from the front (5') based on quality.
156+
- `--cut_front_window_size`: Size of the window for cutting from the front (default 4).
157+
- `--cut_front_mean_quality`: Minimum mean quality for the front window (default 20).
158+
- `--cut_tail`: Enable cutting reads from the tail (3') based on quality.
159+
- `--cut_tail_window_size`: Size of the window for cutting from the tail (default 4).
160+
- `--cut_tail_mean_quality`: Minimum mean quality for the tail window (default 20).
161+
162+
### Other Parameters
163+
- `--thread`: Number of threads to use (default max(all, 16)).
164+
- `--compression`: Output compression level (default 4).
165+
166+
## Advanced Settings
167+
You can provide additional parameters in a text file. The file should contain one parameter per line, and each line should start with the parameter name followed by its value. Parameters here will override the GUI settings.
168+
Check [fastp](https://github.com/OpenGene/fastp) and [fastplong](https://github.com/OpenGene/fastplong) GitHub repository for parameter list.
169+
Please use long options (e.g., `--disable_quality_filtering`) instead of short options (e.g., `-Q`).
170+
For example:
171+
```
172+
--disable_quality_filtering
173+
--qualified_quality_phred 20
174+
--unqualified_percent_limit 30
175+
```
176+
177+
178+
179+
180+
181+
182+
183+
75184
# Classification
76185
Metabuli App provides two taxonomic profiling modes in **Search Settings** panel: **New Search** and **Upload Report**.
77-
<img alt="SearchPage_Demo_Image" src="https://github.com/user-attachments/assets/9ab5a86c-5603-4dc7-be3b-baf2ed490ef0" style="max-height: 600px; width: auto;">
186+
<!-- <img alt="SearchPage_Demo_Image" src="https://github.com/user-attachments/assets/9ab5a86c-5603-4dc7-be3b-baf2ed490ef0" style="max-height: 600px; width: auto;"> -->
78187

79188
## New Classification
189+
#### You can perform taxonomic classification on one or more samples using a specified database.
80190
### Required Fields:
81191
1. **Mode:** Select the analysis mode among single-end, paired-end, or long-read.
82-
2. **Job ID:** Enter a unique identifier for the job.
83-
3. **Select Files:** Upload the necessary files and directories.
192+
2. **Enable Quality Control:** Check it to enable quality control for the input reads.
193+
- `fastp` and `fastplong` are used for short and long reads, respectively.
194+
- Please see QC documentation for more details.
195+
3. **Job ID:** Enter a unique identifier for the job.
196+
4. **Select Files:** Upload the necessary files and directories.
84197
- Read 1 File (and Read 2 File if Paired-end is selected)
198+
- FASTA/FASTQ and their gzipped versions are supported.
199+
- `ADD ENTRY` to upload **multiple samples** to process using the same settings.
85200
- Database Directory
86201
- Output Directory
87-
4. **Max RAM:** Specify the maximum RAM (in GiB) to allocate for the job.
202+
- Result files are saved in `Job ID` directory under the specified output directory.
203+
- When **multiple samples** are processed, results are saved in `Job ID/sample_name` directories.
204+
5. **Max RAM:** Specify the maximum RAM (in GiB) to allocate for the job.
88205

89206
### Advanced Settings (Optional):
90207
- **Threads:** Specify thread count for the job.
@@ -111,6 +228,27 @@ Metabuli App provides two taxonomic profiling modes in **Search Settings** panel
111228
- **Sankey Diagram**: A flow diagram representing the lineage information of the displayed taxa.
112229
- **Krona Chart**: A hierarchical interactive chart that visualizes classification results.
113230

231+
### Generated Result Files:
232+
#### 1. JobID_classifications.tsv: It contains the classification results for each read. The columns are as follows.
233+
1. `is_classified`: Classified or not
234+
2. `name`: Read ID
235+
3. `taxID`: Tax. ID in the tax. dump files used in database creation
236+
4. `query_length`: Effective read length
237+
5. `score`: DNA level identity score
238+
6. `rank`: Taxonomic rank of the taxon
239+
7. `taxID:match_count`: List of "taxID : k-mer match count"
240+
241+
#### 2. JobID_report.tsv: It follows Kraken2's report format. The first line is a header, and the rest of the lines are tab-separated values. The columns are as follow.
242+
243+
1. `clade_proportion`: Percentage of reads classified to the clade rooted at this taxon
244+
2. `clade_count`: Number of reads classified to the clade rooted at this taxon
245+
3. `taxon_count`: Number of reads classified directly to this taxon
246+
4. `rank`: Taxonomic rank of the taxon
247+
5. `taxID`: Tax ID according to the taxonomy dump files used in the database creation
248+
6. `name`: Taxonomic name of the taxon
249+
250+
#### 3. JobID_krona.html: It is for an interactive Krona plot. You can use any modern web browser to open `JobID_krona.html`.
251+
114252
## Upload Report
115253

116254
To visualize results from a previously completed job:
@@ -123,6 +261,11 @@ To visualize results from a previously completed job:
123261

124262
---
125263

264+
# Database Curation
265+
266+
## Download Database
267+
You can download pre-built databases [here](https://metabuli.steineggerlab.workers.dev/).
268+
126269
## Create New Database
127270
You can create a new database in "NEW DATABASE" tab by providing these three files:
128271
1. **FASTA files** : Each sequence must have a unique `>accession.version` or `>accesion` header (e.g., `>CP001849.1` or `>CP001849`).

0 commit comments

Comments
 (0)