Skip to content

Change argument to variant_iter to generic query? #240

@jeromekelleher

Description

@jeromekelleher

Currently the signature of variant_iter directly reflects the structure of the bcftools querying language:

def variant_iter(
    vcz,
    *,
    fields: list[str] | None = None,
    regions: str | None = None,
    targets: str | None = None,
    include: str | None = None,
    exclude: str | None = None,
    samples: list[str] | str | None = None,
):

However, we can easily imagine a future in which we also want to support other querying structures, say, something like SQL. Here, we might do something like

query = sql_query("""
    SELECT variant_position, call_genotype FROM 1kgp3.vcz
    WHERE variant_contig="chr1" AND sample_population == "YRI"
""")
for var in variant_iter(query)
      # Do stuff with var

and also standard bcftools stuff like

# This is unfortunately confusing with "bcftools query", but you get the idea
query = bcftools_query(
    "1kgp3.vcz",
    fields=["variant_position", "call_genotype"],
    regions="chr1"
    samples=# list of YRI samples)
for var in variant_iter(query):
    # Do stuff with var

We don't need to implement the SQL stuff, but I think it would be a shame to limit the API to just supporting the bcftools way of doing things (which is quite limiting in many ways) and it would be good to keep the door open to this in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions