Skip to content

LPL no smaller than PL #277

@jeromekelleher

Description

@jeromekelleher

After running conversion on 1000 Genomes chr2 data, I'm not seeing any reduction in the size of PL fields. Here's the ICF:

$ vcf2zarr inspect 1kg_chr2_lpl.icf/ | grep PL
FORMAT/PL                 Integer      6906  430.58 GiB  60.6 GiB           28  0          3.2e+05
FORMAT/LPL                Integer      6906  430.58 GiB  60.57 GiB          28  -1         3.2e+05
$ vcf2zarr inspect 1kg_chr2_lpl.icf/ | grep 'LAA'
FORMAT/LAA                Integer      4534  282.07 GiB  315.18 MiB          6  -2         6

and on the VCZ: (for a small number of variants using --max-variant-chunks)

$ vcf2zarr inspect 1kg_chr2_lpl.vcz/ | grep PL
/call_LPL                     int32    8.27 MiB    342.01 MiB   41             40  8.55 MiB      211.78 KiB          (1000, 3202, 28)  (100, 1000, 28)  Blosc(cname='zstd', clevel=7, shuffle=NOSHUFFLE, blocksize=0)   None
/call_PL                      int32    8.27 MiB    342.01 MiB   41             40  8.55 MiB      211.78 KiB          (1000, 3202, 28)  (100, 1000, 28)  Blosc(cname='zstd', clevel=7, shuffle=NOSHUFFLE, blocksize=0)   None
$ vcf2zarr inspect 1kg_chr2_lpl.vcz/ | grep LAA
/call_LAA                     int8     102.69 KiB  18.32 MiB   180             40  469.04 KiB    2.57 KiB            (1000, 3202, 6)   (100, 1000, 6)   Blosc(cname='zstd', clevel=7, shuffle=NOSHUFFLE, blocksize=0)   None

So, there's no reduction in the maximum dimension and the storage sizes are essentially identical.

Have you any ideas what might be going on here @Will-Tyler?

I think it would be really worthwhile getting some truth data for LPL that we could compare with. It does seem that getting running Hail is the only way to do this, so probably worth the effort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions