Skip to content

feat: a dosage command for storing TR motif counts in PGEN files #221

@aryarm

Description

@aryarm

Note: This idea originated from the bottom of #73 (comment)

Once PGEN files can store multiallelic dosages, we should consider creating a command called dosage that can append dosage info computed using haptools' Genotypes classes to an existing PGEN file:

  • It can use GenotypesPLINKTR to compute the genotypes, sum them to compute the dosages (and possibly divide by 2 so that they remain in the range [0, 2]?) and then write them to the existing file using PgenWriter.append_dosages_batch(). It should only change the PGEN file. The PVAR and PSAM files can remain the same
  • If --out is specified, we should create a new PGEN file instead of overwriting the existing one. In the former case, we should also copy (or symlink?) the PVAR and PSAM files.

Once we do all of this, we should be able to use PLINK2 to analyze TRs! For example, we could do the following:

  1. Convert a TR VCF into a PGEN file
    plink2 --vcf-half-call m --make-pgen 'pvar-cols=vcfheader,qual,filter,info' --vcf input.vcf --out output
    
  2. Run haptools dosage to add dosages to the PGEN file
    haptools dosage output.pgen
    
  3. Use the PGEN file as input to plink2. For example, we could use it with --r[2]-[un]phased, --ld-*, --pca, --glm, --freq, --clump, or --maf/--mac. Quoting from the documentation:

    Dosages are always used when present

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions