File tree Expand file tree Collapse file tree 6 files changed +84
-11
lines changed Expand file tree Collapse file tree 6 files changed +84
-11
lines changed Original file line number Diff line number Diff line change
1
+ (sec-installation)=
1
2
# Installation
2
3
3
4
4
- ```
5
- $ python3 -m pip install bio2zarr
5
+ ``` bash
6
+ python3 -m pip install bio2zarr
6
7
```
7
8
8
- This will install the programs `` vcf2zarr `` , `` plink2zarr `` and `` vcf_partition ``
9
+ This will install the programs `` vcf2zarr `` and `` vcf_partition ``
9
10
into your local Python path. You may need to update your $PATH to call the
10
11
executables directly.
11
12
12
13
Alternatively, calling
13
- ```
14
- $ python3 -m bio2zarr vcf2zarr <args>
14
+ ``` bash
15
+ python3 -m bio2zarr vcf2zarr < args>
15
16
```
16
17
is equivalent to
17
18
18
- ```
19
- $ vcf2zarr <args>
19
+ ``` bash
20
+ vcf2zarr < args>
20
21
```
21
22
and will always work.
22
23
24
+ :::{note}
25
+ The `` python3 -m bio2zarr vcf2zarr `` for may be replaced with
26
+ `` python3 -m bio2zarr.vcf2zarr `` in the near future.
27
+ See GitHub issue [ 203] ( https://github.com/sgkit-dev/bio2zarr/issues/203 ) .
28
+ :::
29
+
30
+
31
+ :::{warning}
32
+ Windows is not currently supported. Please comment on
33
+ [ this issue] ( https://github.com/sgkit-dev/bio2zarr/issues/174 ) if you would
34
+ like to see Windows support for bio2zarr.
35
+ :::
36
+
23
37
24
38
## Shell completion
25
39
26
40
To enable shell completion for a particular session in Bash do:
27
41
28
- ```
42
+ ``` bash
29
43
eval " $( _VCF2ZARR_COMPLETE=bash_source vcf2zarr) "
30
44
```
31
45
Original file line number Diff line number Diff line change 1
1
# bio2zarr
2
2
3
3
` bio2zarr ` efficiently converts common bioinformatics formats to
4
- [ Zarr] ( https://zarr.readthedocs.io/en/stable/ ) format. Initially supporting converting
5
- VCF to the [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) .
4
+ [ Zarr] ( https://zarr.readthedocs.io/en/stable/ ) format.
5
+
6
+ ## Tools
7
+
8
+ - {ref}` sec-vcf2zarr ` converts VCF data to
9
+ [ VCF Zarr] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) format.
10
+
11
+ - {ref}` sec-vcfpartition ` is a utility to split an input (set of)
12
+ VCFs into a given number of partitions. This is useful for
13
+ parallel processing.
14
+
15
+ ## Development status
6
16
7
17
` bio2zarr ` is in development, contributions, feedback and issues are welcome
8
18
at the [ GitHub repository] ( https://github.com/sgkit-dev/bio2zarr ) .
9
19
20
+ Support for converting PLINK data to VCF Zarr is partially implemented,
21
+ and adding BGEN support is also planned. If you would like to see
22
+ support for other formats (or an interested in helping with implementing),
23
+ please open an [ issue on Github] ( https://github.com/sgkit-dev/bio2zarr/issues )
24
+ to discuss!
25
+
26
+ The package is currently focused on command line interfaces, but a
27
+ Python API is also planned.
Original file line number Diff line number Diff line change
1
+ (sec-vcf2zarr-cli-ref)=
1
2
# CLI Reference
2
3
3
4
% A note on cross references... There's some weird long-standing problem with
57
58
## Encode
58
59
59
60
``` {eval-rst}
61
+ .. _cmd-vcf2zarr-encode:
60
62
.. click:: bio2zarr.cli:encode
61
63
:prog: vcf2zarr encode
62
64
:nested: full
Original file line number Diff line number Diff line change
1
+ (sec-vcf2zarr)=
1
2
# vcf2zarr
2
3
4
+ Convert VCF data to the
5
+ [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ )
6
+ reliably, in parallel or distributed over a cluster.
3
7
4
- Convert a VCF to zarr format:
8
+ See the {ref}` sec-vcf2zarr-tutorial ` for a step-by-step introduction
9
+ and the {ref}` sec-vcf2zarr-cli-ref ` detailed documentation on
10
+ command line options.
11
+
12
+
13
+ ## Quickstart
14
+
15
+ First {ref}` install bio2zarr<sec-installation> ` .
16
+
17
+
18
+ :::{note}
19
+ FINISH ME
20
+ :::
21
+
22
+
23
+ ## How does it work?
24
+ The conversion of VCF data to Zarr is a two-step process:
25
+
26
+ 1 . Convert ({ref}` explode<cmd-vcf2zarr-explode> ` ) VCF file(s) to
27
+ Intermediate Columnar Format (ICF)
28
+ 2 . Convert ({ref}` encode<cmd-vcf2zarr-encode> ` ) ICF to Zarr
29
+
30
+ This two-step process allows ` vcf2zarr ` to determine the correct
31
+ dimension of Zarr arrays corresponding to each VCF field, and
32
+ to keep memory usage tightly bounded while writing the arrays.
33
+
34
+ :::{important}
35
+ The intermediate columnar format is not intended for any use
36
+ other than a temporary storage while converting VCF to Zarr.
37
+ The format may change between versions of ` bio2zarr ` .
38
+ :::
39
+
40
+
41
+ ## Common options
5
42
6
43
```
7
44
$ vcf2zarr convert <VCF1> <VCF2> <zarr>
Original file line number Diff line number Diff line change @@ -9,6 +9,7 @@ kernelspec:
9
9
language : bash
10
10
name : bash
11
11
---
12
+ (sec-vcf2zarr-tutorial)=
12
13
# Tutorial
13
14
14
15
This is a step-by-step tutorial showing you how to convert your
Original file line number Diff line number Diff line change
1
+ (sec-vcfpartition)=
1
2
# vcfpartition
2
3
3
4
## Overview
You can’t perform that action at this time.
0 commit comments