File tree Expand file tree Collapse file tree 5 files changed +66
-7
lines changed
Expand file tree Collapse file tree 5 files changed +66
-7
lines changed Original file line number Diff line number Diff line change 11# 0.1.6 2025-0X-XX
22
3- - Make format-specific dependencies optional (#385 )
4-
5- - Add contigs to plink output (#344 )
3+ - Initial version of supported plink2zarr (#390 , #344 , #382 )
64
7- - Add variant_length and indexing to plink output ( # 382 )
5+ - Make format-specific dependencies optional ( # 385 )
86
97Breaking changes
108
@@ -14,6 +12,8 @@ Breaking changes
1412- Add dimensions and default compressor and filter settings to the schema.
1513 (#361 )
1614
15+ - Various changes to existing experimental plink encoding (#390 )
16+
1717# 0.1.5 2025-03-31
1818
1919- Add support for merging contig IDs across multiple VCFs (#335 )
Original file line number Diff line number Diff line change @@ -6,6 +6,9 @@ chapters:
66 sections :
77 - file : vcf2zarr/tutorial
88 - file : vcf2zarr/cli_ref
9+ - file : plink2zarr/overview
10+ sections :
11+ - file : plink2zarr/cli_ref
912- file : vcfpartition/overview
1013 sections :
1114 - file : vcfpartition/cli_ref
Original file line number Diff line number Diff line change 88- {ref}` sec-vcf2zarr ` converts VCF data to
99 [ VCF Zarr] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) format.
1010
11+ - {ref}` sec-plink2zarr ` converts PLINK 1.0 data to
12+ [ VCF Zarr] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) format.
13+
1114- {ref}` sec-vcfpartition ` is a utility to split an input
1215 VCF into a given number of partitions. This is useful for
1316 parallel processing of VCF data.
1720` bio2zarr ` is in development, contributions, feedback and issues are welcome
1821at the [ GitHub repository] ( https://github.com/sgkit-dev/bio2zarr ) .
1922
20- Support for converting PLINK data to VCF Zarr is partially implemented,
21- and adding BGEN and [ tskit] ( https://tskit.dev/ ) support is also planned.
2223If you would like to see
23- support for other formats (or an interested in helping with implementing),
24+ support for other formats such as BGEN (or an interested in helping with implementing),
2425please open an [ issue on Github] ( https://github.com/sgkit-dev/bio2zarr/issues )
2526to discuss!
2627
Original file line number Diff line number Diff line change 1+ (sec-plink2zarr-cli-ref)=
2+ # CLI Reference
3+
4+ % A note on cross references... There's some weird long-standing problem with
5+ % cross referencing program values in Sphinx, which means that we can't use
6+ % the built-in labels generated by sphinx-click. We can make our own explicit
7+ % targets, but these have to have slightly weird names to avoid conflicting
8+ % with what sphinx-click is doing. So, hence the cmd- prefix.
9+ % Based on: https://github.com/skypilot-org/skypilot/pull/2834
10+
11+ ``` {eval-rst}
12+
13+ .. _cmd-plink2zarr-convert:
14+ .. click:: bio2zarr.cli:convert_plink
15+ :prog: plink2zarr convert
16+ :nested: full
17+
Original file line number Diff line number Diff line change 1+ (sec-plink2zarr)=
2+ # plink2zarr
3+
4+ Convert plink data to the
5+ [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ )
6+ reliably in parallel.
7+
8+ See {ref}` sec-plink2zarr-cli-ref ` for detailed documentation on
9+ command line options.
10+
11+ Conversion of the plink data model to VCF follows the semantics of plink1.9 as closely
12+ as possible. That is, given a binary plink fileset with prefix "fileset" (i.e.,
13+ fileset.bed, fileset.bim, fileset.fam), running
14+ ```
15+ $ plink2zarr convert fileset out.vcz
16+ ```
17+ should produce the same result in `` out.vcz `` as
18+ ```
19+ $ plink1.9 --bfile fileset --keep-allele-order --recode vcf-iid --out tmp
20+ $ vcf2zarr convert tmp.vcf out.vcz
21+ ```
22+
23+ :::{warning}
24+ It is important to note that we follow the same conventions as plink 2.0
25+ where the A1 allele in the [ bim file] ( https://www.cog-genomics.org/plink/2.0/formats#bim )
26+ is the VCF ALT and A2 is the REF.
27+ :::
28+
29+ :::{note}
30+ Currently we only convert the basic VCF-like data from plink, and don't include
31+ phenotypes and pedigree information. These are planned as future enhancements.
32+ Please comment on [ this issue] ( https://github.com/sgkit-dev/bio2zarr/issues/392 )
33+ if you are interested in this functionality.
34+ :::
35+
36+
37+
38+
You can’t perform that action at this time.
0 commit comments