Skip to content

Clarification on using gather function with databases having different scale factors #3870

@tnmquann

Description

@tnmquann

Hi,
I’m trying to confirm whether I’m misunderstanding any part of the documentation regarding sourmash gather/fastgather/fastmultigather and how these commands handle databases built with different scale factors.

While reviewing this guide: https://hackmd.io/vH2LMY38TEy8miUXI1OSNg . It seems possible to run sourmash gather using a query signature generated with --scaled 1000 against a combination of databases where some are built with scaled=1000 and others with scaled=10000. I’ve tested a similar setup and the command executes without errors.

I want to make sure I’m not missing any documentation on this. Specifically, I would like to confirm:

  • Is it officially supported to gather against databases with different scale factors?
  • Is there a recommended relationship between query scale and database scale? For example, if a query signature generated with --scaled 1000, should all databases satisfy scale=1000, ≤1000, ≥1000, or is any combination acceptable?
  • Does mixing scales (e.g., 1000 + 10000) affect sensitivity, containment estimates, abundance, or the final interpretation in any meaningful way?

Sorry if this is a silly question 'cause I always assumed the query and database scales needed to match, so maybe I’ve been carrying around a wrong assumption.
If there are best practices for choosing appropriate scales (or any documentation I may have overlooked), I’d really appreciate any guidance.

Thanks so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions