You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
which is reduced from a quality formula described in
38
+
Parks et. al. 2020 https://doi.org/10.1038/s41587-020-0501-8.
39
+
Other quality score formula are available via `--quality-formula`.
38
40
39
-
If instead CheckM qualities were not provided, then the following holds instead:
41
+
If instead CheckM1/2 qualities are not available, then the following holds instead:
40
42
41
-
3. Each representative genome was specified to galah before other members of the
43
+
3. Each representative genome was specified to Galah before other members of the
42
44
cluster.
43
45
44
46
The overall greedy clustering approach was largely inspired by the work of
45
-
Donovan Parks, as described in [Parks et. al. 2020](https://doi.org/10.1038/s41587-020-0501-8). It
46
-
operates in 3 steps. In the first step, genomes are assigned as representative
47
-
if no genomes of higher quality are >99% ANI. In the second step, each
48
-
non-representative genome is assigned to the representative genome it has the
49
-
highest ANI with.
47
+
Donovan Parks, as described in [Parks et. al. 2020](https://doi.org/10.1038/s41587-020-0501-8).
48
+
It operates in 3 steps. In the first step, genomes are assigned as representative
49
+
if no genomes of higher quality are >95% ANI. In the second step, each
50
+
non-representative genome is assigned to the representative genome with which it
51
+
has the highest ANI.
50
52
51
-
## Installation
53
+
## Example usage
52
54
53
-
### Install through the bioconda package
55
+
For clustering a set of genomes at 95% ANI:
54
56
55
-
Galah can be installed through the [bioconda](https://bioconda.github.io/) conda channel. After initial setup of conda and the bioconda channel, it can be installed with mamba (or conda) with:
The full usage is described on the [manual page](https://wwood.github.io/galah/galah-cluster.html), which can be accessed on the command line running `galah cluster --full-help`.
83
+
## Help
115
84
116
-
### Precluster ANI
117
-
Similar to dRep, galah operates in two stages. In the first, a fast
or [skani](https://github.com/bluenote-1577/skani)) is
120
-
calculated between each pair of genomes. Genome pairs are only considered as
121
-
potentially in the same cluster with [skani](https://github.com/bluenote-1577/skani) or
122
-
[FastANI](https://github.com/ParBLiSS/FastANI) if the prethreshold ANI is
123
-
greater than the specified value. By default, the precluster ANI is set at 95%
124
-
and the final ANI is set at 99%.
85
+
If you have any questions or need help, please [open an issue](https://github.com/wwood/galah/issues).
125
86
126
87
## License
88
+
Galah is developed by the [Woodcroft lab](https://research.qut.edu.au/cmr/team/ben-woodcroft/) at the [Centre for Microbiome Research](https://research.qut.edu.au/cmr), School of Biomedical Sciences, QUT, with contributions from [Samuel Aroney](https://github.com/AroneyS), [Antônio Camargo](https://github.com/apcamargo), and [Rhys Newell](https://github.com/rhysnewell). It is licensed under [GPL3 or later](https://gnu.org/licenses/gpl.html).
127
89
128
-
Galah is made available under GPL3+. See LICENSE.txt for details. Copyright Ben
129
-
Woodcroft.
90
+
The source code is available at [https://github.com/wwood/galah](https://github.com/wwood/galah).
130
91
131
-
Developed by Ben Woodcroft at the [Centre for Microbiome Research, Queensland University of Technology](https://www.qut.edu.au/health/schools/school-of-biomedical-sciences/centre-for-microbiome-research).
92
+
## Citation
93
+
<!-- NOTE: Citations should manually be kept in sync between the repo README and the docs README -->
If you have any questions or need help, please [open an issue](https://github.com/wwood/galah/issues).
86
+
87
+
## License
88
+
Galah is developed by the [Woodcroft lab](https://research.qut.edu.au/cmr/team/ben-woodcroft/) at the [Centre for Microbiome Research](https://research.qut.edu.au/cmr), School of Biomedical Sciences, QUT, with contributions from [Samuel Aroney](https://github.com/AroneyS), [Antônio Camargo](https://github.com/apcamargo), and [Rhys Newell](https://github.com/rhysnewell). It is licensed under [GPL3 or later](https://gnu.org/licenses/gpl.html).
89
+
90
+
The source code is available at [https://github.com/wwood/galah](https://github.com/wwood/galah).
91
+
92
+
## Citation
93
+
<!-- NOTE: Citations should manually be kept in sync between the repo README and the docs README -->
94
+
95
+
Aroney, S.T.N., Camargo, A.P., Tyson, G.W. and Woodcroft B.J.
96
+
Galah: More scalable dereplication for metagenome assembled genomes.
0 commit comments