Skip to content

Commit 2c2edd3

Browse files
committed
switch to doctave for docs
add analyse/process to docs
1 parent 3c98337 commit 2c2edd3

File tree

15 files changed

+372
-345
lines changed

15 files changed

+372
-345
lines changed

.github/workflows/deploy-docs.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: Build manual pages and deploy documentation
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
paths:
8+
- 'docs/**'
9+
- 'doctave.yml'
10+
- '.github/workflows/deploy-docs.yml'
11+
12+
jobs:
13+
build:
14+
name: Deploy
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v3
18+
- name: 'Checkout cargo and install doctave'
19+
uses: actions-rs/toolchain@v1.0.6
20+
with:
21+
toolchain: stable
22+
- run: cargo install --git https://github.com/Doctave/doctave --tag 0.4.2
23+
- name: 'Build doctave site'
24+
run: doctave build --release
25+
- name: 'GitHub Pages'
26+
uses: crazy-max/ghaction-github-pages@v3.0.0
27+
with:
28+
build_dir: site/
29+
env:
30+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

README.md

Lines changed: 55 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,18 @@
1-
<img src="images/galah_logo.png" alt="Galah logo" width="600"/>
2-
3-
- [Galah](#galah)
4-
- [Installation](#installation)
5-
- [Install through the bioconda package](#install-through-the-bioconda-package)
6-
- [Pre-compiled binary](#pre-compiled-binary)
7-
- [Compiling from source](#compiling-from-source)
8-
- [Development](#development)
9-
- [Dependencies](#dependencies)
10-
- [Usage](#usage)
11-
- [Precluster ANI](#precluster-ani)
12-
- [License](#license)
1+
<!-- NOTE: This intro should manually be kept in sync between the repo README and the docs README -->
2+
3+
[![Current Build](https://github.com/wwood/galah/actions/workflows/test-galah.yml/badge.svg)](https://github.com/wwood/galah/actions)
4+
[![Conda version](https://img.shields.io/conda/v/bioconda/galah)](https://anaconda.org/bioconda/galah)
5+
[![Conda downloads](https://img.shields.io/conda/d/bioconda/galah)](https://anaconda.org/bioconda/galah)
6+
[![Crates.io version](https://img.shields.io/crates/v/galah)](https://crates.io/crates/galah)
7+
[![Crates.io downloads](https://img.shields.io/crates/d/galah)](https://crates.io/crates/galah)
138

149
# Galah
1510

16-
[![Anaconda-Server Badge](https://anaconda.org/bioconda/galah/badges/version.svg)](https://anaconda.org/bioconda/galah)
11+
[<img src="docs/_include/galah_logo.png" alt="Galah logo" width="600"/>](galah_logo.png)
12+
13+
Galah - Scalable dereplication and MIMAG calculation for metagenome assembled genomes
14+
15+
Documentation can be found at [https://wwood.github.io/galah/](https://wwood.github.io/galah/).
1716

1817
Galah aims to be a more scalable metagenome assembled genome (MAG) dereplication
1918
method. That is, it clusters microbial genomes together based on their average
@@ -23,111 +22,76 @@ representative.
2322
Galah uses a greedy clustering approach to speed up genome dereplication,
2423
relative to e.g. [dRep](https://drep.readthedocs.io/), particularly when there
2524
are many closely related genomes (i.e. >95% ANI). Generated cluster
26-
representatives have 2 properties. If the ANI threshold was set to 99%, then:
25+
representatives have 2 properties. If the ANI threshold was set to 95%, then:
2726

28-
1. Each representative is <99% ANI to each other representative.
29-
2. All members are >=99% ANI to the representative.
27+
1. Each representative is <95% ANI to each other representative.
28+
2. All members are >=95% ANI to the representative.
3029

31-
If [CheckM](https://ecogenomics.github.io/CheckM/) genome qualities were
32-
specified, then the clusters have an additional property:
30+
If `--run-checkm2` was specified, or [CheckM2](https://github.com/chklovski/CheckM2) /
31+
[CheckM](https://ecogenomics.github.io/CheckM/) genome qualities were provided,
32+
then the clusters have an additional property:
3333

3434
3. Each representative genome has a better quality score than other members of
3535
the cluster. Each genome is assigned a quality score based on the formula
36-
`completeness-5*contamination-5*num_contigs/100-5*num_ambiguous_bases/100000`, which is reduced from a quality formula described in
37-
Parks et. al. 2020 https://doi.org/10.1038/s41587-020-0501-8.
36+
`completeness-5*contamination-5*num_contigs/100-5*num_ambiguous_bases/100000`,
37+
which is reduced from a quality formula described in
38+
Parks et. al. 2020 https://doi.org/10.1038/s41587-020-0501-8.
39+
Other quality score formula are available via `--quality-formula`.
3840

39-
If instead CheckM qualities were not provided, then the following holds instead:
41+
If instead CheckM1/2 qualities are not available, then the following holds instead:
4042

41-
3. Each representative genome was specified to galah before other members of the
43+
3. Each representative genome was specified to Galah before other members of the
4244
cluster.
4345

4446
The overall greedy clustering approach was largely inspired by the work of
45-
Donovan Parks, as described in [Parks et. al. 2020](https://doi.org/10.1038/s41587-020-0501-8). It
46-
operates in 3 steps. In the first step, genomes are assigned as representative
47-
if no genomes of higher quality are >99% ANI. In the second step, each
48-
non-representative genome is assigned to the representative genome it has the
49-
highest ANI with.
47+
Donovan Parks, as described in [Parks et. al. 2020](https://doi.org/10.1038/s41587-020-0501-8).
48+
It operates in 3 steps. In the first step, genomes are assigned as representative
49+
if no genomes of higher quality are >95% ANI. In the second step, each
50+
non-representative genome is assigned to the representative genome with which it
51+
has the highest ANI.
5052

51-
## Installation
53+
## Example usage
5254

53-
### Install through the bioconda package
55+
For clustering a set of genomes at 95% ANI:
5456

55-
Galah can be installed through the [bioconda](https://bioconda.github.io/) conda channel. After initial setup of conda and the bioconda channel, it can be installed with mamba (or conda) with:
56-
57-
```
58-
mamba install galah
57+
```bash
58+
galah cluster --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
59+
--output-cluster-definition clusters.tsv
5960
```
6061

61-
One can see [details of the galah recipe](https://bioconda.github.io/recipes/galah/README.html).
62-
63-
Galah can also be used indirectly through
64-
[CoverM](https://github.com/wwood/CoverM) via its `cluster` subcommand, which is also available on bioconda.
65-
66-
### Pre-compiled binary
67-
68-
Galah can be installed by downloading statically compiled binaries, available on
69-
the [releases page](https://github.com/wwood/Galah/releases).
70-
71-
Third party dependencies listed below are required for this method.
72-
73-
### Compiling from source
62+
For clustering a set of contigs at 95% ANI:
7463

75-
Galah can also be installed from source, using the cargo build system after
76-
installing [Rust](https://www.rust-lang.org/).
77-
78-
```
79-
cargo install galah
64+
```bash
65+
galah cluster --cluster-contigs --small-genomes --genome-fasta-files /path/to/contigs.fna \
66+
--output-cluster-definition clusters.tsv
8067
```
81-
Third party dependencies listed below are required for this method.
82-
83-
### Development
8468

85-
To run an unreleased version of Galah, after installing
86-
[Rust](https://www.rust-lang.org/):
69+
For determining MIMAG quality scores for a set of genomes with CheckM2:
8770

71+
```bash
72+
galah analyse --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
73+
--output-mimag-summary mimag.tsv
8874
```
89-
git clone https://github.com/wwood/galah
90-
cd galah
91-
pixi run cargo run -- cluster ...etc...
92-
```
93-
Third party dependencies listed below are required for this method.
94-
95-
### Dependencies
9675

97-
For some advanced usage of Galah, 3rd party tools are required, which must be installed separately:
76+
For clustering and determining MIMAG quality scores:
9877

99-
* skani v0.2.2 https://github.com/bluenote-1577/skani
100-
* FastANI v1.34 https://github.com/ParBLiSS/FastANI
101-
102-
## Usage
103-
For clustering a set of genomes at 99% ANI:
104-
```
105-
galah cluster --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna --output-cluster-definition clusters.tsv
106-
```
107-
There are several other options for specifying genomes, ANI cutoffs, etc.
108-
109-
For clustering a set of contigs at 99% ANI:
110-
```
111-
galah cluster --cluster-contigs --small-genomes --genome-fasta-files /path/to/contigs.fna --output-cluster-definition clusters.tsv
78+
```bash
79+
galah process --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
80+
--output-cluster-definition clusters.tsv --output-mimag-summary mimag.tsv
11281
```
11382

114-
The full usage is described on the [manual page](https://wwood.github.io/galah/galah-cluster.html), which can be accessed on the command line running `galah cluster --full-help`.
83+
## Help
11584

116-
### Precluster ANI
117-
Similar to dRep, galah operates in two stages. In the first, a fast
118-
pre-clustering distance ([finch](https://github.com/onecodex/finch-rs)
119-
or [skani](https://github.com/bluenote-1577/skani)) is
120-
calculated between each pair of genomes. Genome pairs are only considered as
121-
potentially in the same cluster with [skani](https://github.com/bluenote-1577/skani) or
122-
[FastANI](https://github.com/ParBLiSS/FastANI) if the prethreshold ANI is
123-
greater than the specified value. By default, the precluster ANI is set at 95%
124-
and the final ANI is set at 99%.
85+
If you have any questions or need help, please [open an issue](https://github.com/wwood/galah/issues).
12586

12687
## License
88+
Galah is developed by the [Woodcroft lab](https://research.qut.edu.au/cmr/team/ben-woodcroft/) at the [Centre for Microbiome Research](https://research.qut.edu.au/cmr), School of Biomedical Sciences, QUT, with contributions from [Samuel Aroney](https://github.com/AroneyS), [Antônio Camargo](https://github.com/apcamargo), and [Rhys Newell](https://github.com/rhysnewell). It is licensed under [GPL3 or later](https://gnu.org/licenses/gpl.html).
12789

128-
Galah is made available under GPL3+. See LICENSE.txt for details. Copyright Ben
129-
Woodcroft.
90+
The source code is available at [https://github.com/wwood/galah](https://github.com/wwood/galah).
13091

131-
Developed by Ben Woodcroft at the [Centre for Microbiome Research, Queensland University of Technology](https://www.qut.edu.au/health/schools/school-of-biomedical-sciences/centre-for-microbiome-research).
92+
## Citation
93+
<!-- NOTE: Citations should manually be kept in sync between the repo README and the docs README -->
13294

133-
[galah]: Eolophus_roseicapilla_-Wamboin,_NSW,_Australia_-juvenile-8.smaller.jpg
95+
Aroney, S.T.N., Camargo, A.P., Tyson, G.W. and Woodcroft B.J.
96+
Galah: More scalable dereplication for metagenome assembled genomes.
97+
Zenodo (2024). https://doi.org/10.5281/zenodo.13637856

docs/README.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
<!-- NOTE: This intro should manually be kept in sync between the repo README and the docs README -->
2+
3+
[![Current Build](https://github.com/wwood/galah/actions/workflows/test-galah.yml/badge.svg)](https://github.com/wwood/galah/actions)
4+
[![Conda version](https://img.shields.io/conda/v/bioconda/galah)](https://anaconda.org/bioconda/galah)
5+
[![Conda downloads](https://img.shields.io/conda/d/bioconda/galah)](https://anaconda.org/bioconda/galah)
6+
[![Crates.io version](https://img.shields.io/crates/v/galah)](https://crates.io/crates/galah)
7+
[![Crates.io downloads](https://img.shields.io/crates/d/galah)](https://crates.io/crates/galah)
8+
9+
# Galah
10+
11+
[<img src="docs/_include/galah_logo.png" alt="Galah logo" width="600"/>](galah_logo.png)
12+
13+
Galah - Scalable dereplication and MIMAG calculation for metagenome assembled genomes
14+
15+
Documentation can be found at [https://wwood.github.io/galah/](https://wwood.github.io/galah/).
16+
17+
Galah aims to be a more scalable metagenome assembled genome (MAG) dereplication
18+
method. That is, it clusters microbial genomes together based on their average
19+
nucleotide identity (ANI), and chooses a single member of each cluster as the
20+
representative.
21+
22+
Galah uses a greedy clustering approach to speed up genome dereplication,
23+
relative to e.g. [dRep](https://drep.readthedocs.io/), particularly when there
24+
are many closely related genomes (i.e. >95% ANI). Generated cluster
25+
representatives have 2 properties. If the ANI threshold was set to 95%, then:
26+
27+
1. Each representative is <95% ANI to each other representative.
28+
2. All members are >=95% ANI to the representative.
29+
30+
If `--run-checkm2` was specified, or [CheckM2](https://github.com/chklovski/CheckM2) /
31+
[CheckM](https://ecogenomics.github.io/CheckM/) genome qualities were provided,
32+
then the clusters have an additional property:
33+
34+
3. Each representative genome has a better quality score than other members of
35+
the cluster. Each genome is assigned a quality score based on the formula
36+
`completeness-5*contamination-5*num_contigs/100-5*num_ambiguous_bases/100000`,
37+
which is reduced from a quality formula described in
38+
Parks et. al. 2020 https://doi.org/10.1038/s41587-020-0501-8.
39+
Other quality score formula are available via `--quality-formula`.
40+
41+
If instead CheckM1/2 qualities are not available, then the following holds instead:
42+
43+
3. Each representative genome was specified to Galah before other members of the
44+
cluster.
45+
46+
The overall greedy clustering approach was largely inspired by the work of
47+
Donovan Parks, as described in [Parks et. al. 2020](https://doi.org/10.1038/s41587-020-0501-8).
48+
It operates in 3 steps. In the first step, genomes are assigned as representative
49+
if no genomes of higher quality are >95% ANI. In the second step, each
50+
non-representative genome is assigned to the representative genome with which it
51+
has the highest ANI.
52+
53+
## Example usage
54+
55+
For clustering a set of genomes at 95% ANI:
56+
57+
```bash
58+
galah cluster --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
59+
--output-cluster-definition clusters.tsv
60+
```
61+
62+
For clustering a set of contigs at 95% ANI:
63+
64+
```bash
65+
galah cluster --cluster-contigs --small-genomes --genome-fasta-files /path/to/contigs.fna \
66+
--output-cluster-definition clusters.tsv
67+
```
68+
69+
For determining MIMAG quality scores for a set of genomes with CheckM2:
70+
71+
```bash
72+
galah analyse --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
73+
--output-mimag-summary mimag.tsv
74+
```
75+
76+
For clustering and determining MIMAG quality scores:
77+
78+
```bash
79+
galah process --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
80+
--output-cluster-definition clusters.tsv --output-mimag-summary mimag.tsv
81+
```
82+
83+
## Help
84+
85+
If you have any questions or need help, please [open an issue](https://github.com/wwood/galah/issues).
86+
87+
## License
88+
Galah is developed by the [Woodcroft lab](https://research.qut.edu.au/cmr/team/ben-woodcroft/) at the [Centre for Microbiome Research](https://research.qut.edu.au/cmr), School of Biomedical Sciences, QUT, with contributions from [Samuel Aroney](https://github.com/AroneyS), [Antônio Camargo](https://github.com/apcamargo), and [Rhys Newell](https://github.com/rhysnewell). It is licensed under [GPL3 or later](https://gnu.org/licenses/gpl.html).
89+
90+
The source code is available at [https://github.com/wwood/galah](https://github.com/wwood/galah).
91+
92+
## Citation
93+
<!-- NOTE: Citations should manually be kept in sync between the repo README and the docs README -->
94+
95+
Aroney, S.T.N., Camargo, A.P., Tyson, G.W. and Woodcroft B.J.
96+
Galah: More scalable dereplication for metagenome assembled genomes.
97+
Zenodo (2024). https://doi.org/10.5281/zenodo.13637856

docs/_include/.keep

Whitespace-only changes.

docs/_include/.nojekyll

Whitespace-only changes.

docs/galah-cluster.html

Lines changed: 0 additions & 254 deletions
This file was deleted.

0 commit comments

Comments
 (0)