Skip to content

Conversation

@mlin
Copy link
Member

@mlin mlin commented Sep 7, 2020

This PR documents a convention developed in spVCF to reduce the size of population-wide VCF files (presenting the full locus x sample matrix) by selectively omitting FORMAT fields. As written, this is not a spec change but merely suggests a useful invocation of an existing clause (referenced inline). We suggest it may be worth documenting expressly because we've encountered some downstream tools that do get tripped up by it.

In our experiments, applying this convention to WGS/WES VCF files for cohorts like 1KGP and UKB (generated with different pipelines) delivers 4-6X file size reduction without doing anything else.

Related PRs:

@hts-specs-bot
Copy link

Changed PDFs as of 508c8c6: VCFv4.4.draft (diff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants