Skip to content

Commit b87be4c

Browse files
Finish up docs for plink and update CHANGELOG
Add the docs files
1 parent f3d5dc9 commit b87be4c

File tree

5 files changed

+66
-7
lines changed

5 files changed

+66
-7
lines changed

CHANGELOG.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,8 @@
11
# 0.1.6 2025-0X-XX
22

3-
- Make format-specific dependencies optional (#385)
4-
5-
- Add contigs to plink output (#344)
3+
- Initial version of supported plink2zarr (#390, #344, #382)
64

7-
- Add variant_length and indexing to plink output (#382)
5+
- Make format-specific dependencies optional (#385)
86

97
Breaking changes
108

@@ -14,6 +12,8 @@ Breaking changes
1412
- Add dimensions and default compressor and filter settings to the schema.
1513
(#361)
1614

15+
- Various changes to existing experimental plink encoding (#390)
16+
1717
# 0.1.5 2025-03-31
1818

1919
- Add support for merging contig IDs across multiple VCFs (#335)

docs/_toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ chapters:
66
sections:
77
- file: vcf2zarr/tutorial
88
- file: vcf2zarr/cli_ref
9+
- file: plink2zarr/overview
10+
sections:
11+
- file: plink2zarr/cli_ref
912
- file: vcfpartition/overview
1013
sections:
1114
- file: vcfpartition/cli_ref

docs/intro.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
- {ref}`sec-vcf2zarr` converts VCF data to
99
[VCF Zarr](https://github.com/sgkit-dev/vcf-zarr-spec/) format.
1010

11+
- {ref}`sec-plink2zarr` converts PLINK 1.0 data to
12+
[VCF Zarr](https://github.com/sgkit-dev/vcf-zarr-spec/) format.
13+
1114
- {ref}`sec-vcfpartition` is a utility to split an input
1215
VCF into a given number of partitions. This is useful for
1316
parallel processing of VCF data.
@@ -17,10 +20,8 @@
1720
`bio2zarr` is in development, contributions, feedback and issues are welcome
1821
at the [GitHub repository](https://github.com/sgkit-dev/bio2zarr).
1922

20-
Support for converting PLINK data to VCF Zarr is partially implemented,
21-
and adding BGEN and [tskit](https://tskit.dev/) support is also planned.
2223
If you would like to see
23-
support for other formats (or an interested in helping with implementing),
24+
support for other formats such as BGEN (or an interested in helping with implementing),
2425
please open an [issue on Github](https://github.com/sgkit-dev/bio2zarr/issues)
2526
to discuss!
2627

docs/plink2zarr/cli_ref.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
(sec-plink2zarr-cli-ref)=
2+
# CLI Reference
3+
4+
% A note on cross references... There's some weird long-standing problem with
5+
% cross referencing program values in Sphinx, which means that we can't use
6+
% the built-in labels generated by sphinx-click. We can make our own explicit
7+
% targets, but these have to have slightly weird names to avoid conflicting
8+
% with what sphinx-click is doing. So, hence the cmd- prefix.
9+
% Based on: https://github.com/skypilot-org/skypilot/pull/2834
10+
11+
```{eval-rst}
12+
13+
.. _cmd-plink2zarr-convert:
14+
.. click:: bio2zarr.cli:convert_plink
15+
:prog: plink2zarr convert
16+
:nested: full
17+

docs/plink2zarr/overview.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
(sec-plink2zarr)=
2+
# plink2zarr
3+
4+
Convert plink data to the
5+
[VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/)
6+
reliably in parallel.
7+
8+
See {ref}`sec-plink2zarr-cli-ref` for detailed documentation on
9+
command line options.
10+
11+
Conversion of the plink data model to VCF follows the semantics of plink1.9 as closely
12+
as possible. That is, given a binary plink fileset with prefix "fileset" (i.e.,
13+
fileset.bed, fileset.bim, fileset.fam), running
14+
```
15+
$ plink2zarr convert fileset out.vcz
16+
```
17+
should produce the same result in ``out.vcz`` as
18+
```
19+
$ plink1.9 --bfile fileset --keep-allele-order --recode vcf-iid --out tmp
20+
$ vcf2zarr convert tmp.vcf out.vcz
21+
```
22+
23+
:::{warning}
24+
It is important to note that we follow the same conventions as plink 2.0
25+
where the A1 allele in the [bim file](https://www.cog-genomics.org/plink/2.0/formats#bim)
26+
is the VCF ALT and A2 is the REF.
27+
:::
28+
29+
:::{note}
30+
Currently we only convert the basic VCF-like data from plink, and don't include
31+
phenotypes and pedigree information. These are planned as future enhancements.
32+
Please comment on [this issue](https://github.com/sgkit-dev/bio2zarr/issues/392)
33+
if you are interested in this functionality.
34+
:::
35+
36+
37+
38+

0 commit comments

Comments
 (0)