-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Hi,
I’m trying to confirm whether I’m misunderstanding any part of the documentation regarding sourmash gather/fastgather/fastmultigather and how these commands handle databases built with different scale factors.
While reviewing this guide: https://hackmd.io/vH2LMY38TEy8miUXI1OSNg . It seems possible to run sourmash gather using a query signature generated with --scaled 1000 against a combination of databases where some are built with scaled=1000 and others with scaled=10000. I’ve tested a similar setup and the command executes without errors.
I want to make sure I’m not missing any documentation on this. Specifically, I would like to confirm:
- Is it officially supported to gather against databases with different scale factors?
- Is there a recommended relationship between query scale and database scale? For example, if a query signature generated with
--scaled 1000, should all databases satisfy scale=1000, ≤1000, ≥1000, or is any combination acceptable? - Does mixing scales (e.g., 1000 + 10000) affect sensitivity, containment estimates, abundance, or the final interpretation in any meaningful way?
Sorry if this is a silly question 'cause I always assumed the query and database scales needed to match, so maybe I’ve been carrying around a wrong assumption.
If there are best practices for choosing appropriate scales (or any documentation I may have overlooked), I’d really appreciate any guidance.
Thanks so much!