File tree Expand file tree Collapse file tree 5 files changed +66
-7
lines changed Expand file tree Collapse file tree 5 files changed +66
-7
lines changed Original file line number Diff line number Diff line change 1
1
# 0.1.6 2025-0X-XX
2
2
3
- - Make format-specific dependencies optional (#385 )
4
-
5
- - Add contigs to plink output (#344 )
3
+ - Initial version of supported plink2zarr (#390 , #344 , #382 )
6
4
7
- - Add variant_length and indexing to plink output ( # 382 )
5
+ - Make format-specific dependencies optional ( # 385 )
8
6
9
7
Breaking changes
10
8
@@ -14,6 +12,8 @@ Breaking changes
14
12
- Add dimensions and default compressor and filter settings to the schema.
15
13
(#361 )
16
14
15
+ - Various changes to existing experimental plink encoding (#390 )
16
+
17
17
# 0.1.5 2025-03-31
18
18
19
19
- Add support for merging contig IDs across multiple VCFs (#335 )
Original file line number Diff line number Diff line change @@ -6,6 +6,9 @@ chapters:
6
6
sections :
7
7
- file : vcf2zarr/tutorial
8
8
- file : vcf2zarr/cli_ref
9
+ - file : plink2zarr/overview
10
+ sections :
11
+ - file : plink2zarr/cli_ref
9
12
- file : vcfpartition/overview
10
13
sections :
11
14
- file : vcfpartition/cli_ref
Original file line number Diff line number Diff line change 8
8
- {ref}` sec-vcf2zarr ` converts VCF data to
9
9
[ VCF Zarr] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) format.
10
10
11
+ - {ref}` sec-plink2zarr ` converts PLINK 1.0 data to
12
+ [ VCF Zarr] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) format.
13
+
11
14
- {ref}` sec-vcfpartition ` is a utility to split an input
12
15
VCF into a given number of partitions. This is useful for
13
16
parallel processing of VCF data.
17
20
` bio2zarr ` is in development, contributions, feedback and issues are welcome
18
21
at the [ GitHub repository] ( https://github.com/sgkit-dev/bio2zarr ) .
19
22
20
- Support for converting PLINK data to VCF Zarr is partially implemented,
21
- and adding BGEN and [ tskit] ( https://tskit.dev/ ) support is also planned.
22
23
If you would like to see
23
- support for other formats (or an interested in helping with implementing),
24
+ support for other formats such as BGEN (or an interested in helping with implementing),
24
25
please open an [ issue on Github] ( https://github.com/sgkit-dev/bio2zarr/issues )
25
26
to discuss!
26
27
Original file line number Diff line number Diff line change
1
+ (sec-plink2zarr-cli-ref)=
2
+ # CLI Reference
3
+
4
+ % A note on cross references... There's some weird long-standing problem with
5
+ % cross referencing program values in Sphinx, which means that we can't use
6
+ % the built-in labels generated by sphinx-click. We can make our own explicit
7
+ % targets, but these have to have slightly weird names to avoid conflicting
8
+ % with what sphinx-click is doing. So, hence the cmd- prefix.
9
+ % Based on: https://github.com/skypilot-org/skypilot/pull/2834
10
+
11
+ ``` {eval-rst}
12
+
13
+ .. _cmd-plink2zarr-convert:
14
+ .. click:: bio2zarr.cli:convert_plink
15
+ :prog: plink2zarr convert
16
+ :nested: full
17
+
Original file line number Diff line number Diff line change
1
+ (sec-plink2zarr)=
2
+ # plink2zarr
3
+
4
+ Convert plink data to the
5
+ [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ )
6
+ reliably in parallel.
7
+
8
+ See {ref}` sec-plink2zarr-cli-ref ` for detailed documentation on
9
+ command line options.
10
+
11
+ Conversion of the plink data model to VCF follows the semantics of plink1.9 as closely
12
+ as possible. That is, given a binary plink fileset with prefix "fileset" (i.e.,
13
+ fileset.bed, fileset.bim, fileset.fam), running
14
+ ```
15
+ $ plink2zarr convert fileset out.vcz
16
+ ```
17
+ should produce the same result in `` out.vcz `` as
18
+ ```
19
+ $ plink1.9 --bfile fileset --keep-allele-order --recode vcf-iid --out tmp
20
+ $ vcf2zarr convert tmp.vcf out.vcz
21
+ ```
22
+
23
+ :::{warning}
24
+ It is important to note that we follow the same conventions as plink 2.0
25
+ where the A1 allele in the [ bim file] ( https://www.cog-genomics.org/plink/2.0/formats#bim )
26
+ is the VCF ALT and A2 is the REF.
27
+ :::
28
+
29
+ :::{note}
30
+ Currently we only convert the basic VCF-like data from plink, and don't include
31
+ phenotypes and pedigree information. These are planned as future enhancements.
32
+ Please comment on [ this issue] ( https://github.com/sgkit-dev/bio2zarr/issues/392 )
33
+ if you are interested in this functionality.
34
+ :::
35
+
36
+
37
+
38
+
You can’t perform that action at this time.
0 commit comments