Skip to content

Commit 469ec33

Browse files
authored
Update README.md
1 parent 1370c3b commit 469ec33

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,6 @@
77

88
skDER (& CIDDER): efficient & high-resolution dereplication of microbial genomes to select representatives.
99

10-
> [!IMPORTANT]
11-
> In v1.3.3, we introduced a `low_mem_greedy` option for low-memory dereplication for the top 20 taxa which are particularly well sequenced (e.g. those which have >10k or >20k genomes available). As we showed in the manuscript, while dereplication by skDER/cidder or other methods is typically not very memory-intensive when applied to an input set of <5,000 genomes, memory needs can expand when you go beyond this. The `lom_mem_greedy` mode was not included in the manuscript and is still being benchmarked - I plan to update the wiki with details on how its representative selection compares to the standard greedy approach. I expect the quality of representatives selected to be slightly worse because it does not account for "connectivity" in prioritizing their selection, but it is considerably faster and more computationally efficient by leveraging skani's `search` function through a greedy/iterative approach that prioritizes based on only N50 when applied to large datasets. As an example, we were able to dereplicate >20,000 *Staphylococcus* from GTDB R220 in around 2.25 hours using 20 threads and ~1 GB of memory using the command: `skder -t Staphylococcus -d greedy -c 20 -r R220 -o Staph_R220_skDER_LMG_Results/ -auto -d low_mem_greedy`. For those interested in using this on their laptops, genomes can still add up in size, so make sure you have an appropriate amount of disk space available for the number of genomes you plan to dereplicate.
12-
1310
> ***Warning:*** Please make sure to use version 1.0.7 or greater to avoid a bug in previous versions!
1411
1512
**Contents**
@@ -26,6 +23,9 @@ skDER (& CIDDER): efficient & high-resolution dereplication of microbial genomes
2623

2724
<img src="https://raw.githubusercontent.com/raufs/skDER/main/images/Logo.png" alt="drawing" width="300"/> <img src="https://github.com/raufs/skDER/blob/main/images/Logo2.png" alt="drawing" width="223.5"/>
2825

26+
> [!IMPORTANT]
27+
> In v1.3.3, we introduced a `low_mem_greedy` option for low-memory dereplication for the top 20 taxa which are particularly well sequenced (e.g. those which have >10k or >20k genomes available). As we showed in the manuscript, while dereplication by skDER/cidder or other methods is typically not very memory-intensive when applied to an input set of <5,000 genomes, memory needs can expand when you go beyond this. The `lom_mem_greedy` mode was not included in the manuscript and is still being benchmarked - I plan to update the wiki with details on how its representative selection compares to the standard greedy approach. I expect the quality of representatives selected to be slightly worse because it does not account for "connectivity" in prioritizing their selection, but it is considerably faster and more computationally efficient by leveraging skani's `search` function through a greedy/iterative approach that prioritizes based on only N50 when applied to large datasets. As an example, we were able to dereplicate >20,000 *Staphylococcus* from GTDB R220 in around 2.25 hours using 20 threads and ~1 GB of memory using the command: `skder -t Staphylococcus -d greedy -c 20 -r R220 -o Staph_R220_skDER_LMG_Results/ -auto -d low_mem_greedy`. For those interested in using this on their laptops, genomes can still add up in size, so make sure you have an appropriate amount of disk space available for the number of genomes you plan to dereplicate.
28+
2929
## Installation
3030

3131
### Bioconda

0 commit comments

Comments
 (0)