File tree Expand file tree Collapse file tree 6 files changed +84
-11
lines changed Expand file tree Collapse file tree 6 files changed +84
-11
lines changed Original file line number Diff line number Diff line change 1+ (sec-installation)=
12# Installation
23
34
4- ```
5- $ python3 -m pip install bio2zarr
5+ ``` bash
6+ python3 -m pip install bio2zarr
67```
78
8- This will install the programs `` vcf2zarr `` , `` plink2zarr `` and `` vcf_partition ``
9+ This will install the programs `` vcf2zarr `` and `` vcf_partition ``
910into your local Python path. You may need to update your $PATH to call the
1011executables directly.
1112
1213Alternatively, calling
13- ```
14- $ python3 -m bio2zarr vcf2zarr <args>
14+ ``` bash
15+ python3 -m bio2zarr vcf2zarr < args>
1516```
1617is equivalent to
1718
18- ```
19- $ vcf2zarr <args>
19+ ``` bash
20+ vcf2zarr < args>
2021```
2122and will always work.
2223
24+ :::{note}
25+ The `` python3 -m bio2zarr vcf2zarr `` for may be replaced with
26+ `` python3 -m bio2zarr.vcf2zarr `` in the near future.
27+ See GitHub issue [ 203] ( https://github.com/sgkit-dev/bio2zarr/issues/203 ) .
28+ :::
29+
30+
31+ :::{warning}
32+ Windows is not currently supported. Please comment on
33+ [ this issue] ( https://github.com/sgkit-dev/bio2zarr/issues/174 ) if you would
34+ like to see Windows support for bio2zarr.
35+ :::
36+
2337
2438## Shell completion
2539
2640To enable shell completion for a particular session in Bash do:
2741
28- ```
42+ ``` bash
2943eval " $( _VCF2ZARR_COMPLETE=bash_source vcf2zarr) "
3044```
3145
Original file line number Diff line number Diff line change 11# bio2zarr
22
33` bio2zarr ` efficiently converts common bioinformatics formats to
4- [ Zarr] ( https://zarr.readthedocs.io/en/stable/ ) format. Initially supporting converting
5- VCF to the [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) .
4+ [ Zarr] ( https://zarr.readthedocs.io/en/stable/ ) format.
5+
6+ ## Tools
7+
8+ - {ref}` sec-vcf2zarr ` converts VCF data to
9+ [ VCF Zarr] ( https://github.com/sgkit-dev/vcf-zarr-spec/ ) format.
10+
11+ - {ref}` sec-vcfpartition ` is a utility to split an input (set of)
12+ VCFs into a given number of partitions. This is useful for
13+ parallel processing.
14+
15+ ## Development status
616
717` bio2zarr ` is in development, contributions, feedback and issues are welcome
818at the [ GitHub repository] ( https://github.com/sgkit-dev/bio2zarr ) .
919
20+ Support for converting PLINK data to VCF Zarr is partially implemented,
21+ and adding BGEN support is also planned. If you would like to see
22+ support for other formats (or an interested in helping with implementing),
23+ please open an [ issue on Github] ( https://github.com/sgkit-dev/bio2zarr/issues )
24+ to discuss!
25+
26+ The package is currently focused on command line interfaces, but a
27+ Python API is also planned.
Original file line number Diff line number Diff line change 1+ (sec-vcf2zarr-cli-ref)=
12# CLI Reference
23
34% A note on cross references... There's some weird long-standing problem with
5758## Encode
5859
5960``` {eval-rst}
61+ .. _cmd-vcf2zarr-encode:
6062.. click:: bio2zarr.cli:encode
6163 :prog: vcf2zarr encode
6264 :nested: full
Original file line number Diff line number Diff line change 1+ (sec-vcf2zarr)=
12# vcf2zarr
23
4+ Convert VCF data to the
5+ [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ )
6+ reliably, in parallel or distributed over a cluster.
37
4- Convert a VCF to zarr format:
8+ See the {ref}` sec-vcf2zarr-tutorial ` for a step-by-step introduction
9+ and the {ref}` sec-vcf2zarr-cli-ref ` detailed documentation on
10+ command line options.
11+
12+
13+ ## Quickstart
14+
15+ First {ref}` install bio2zarr<sec-installation> ` .
16+
17+
18+ :::{note}
19+ FINISH ME
20+ :::
21+
22+
23+ ## How does it work?
24+ The conversion of VCF data to Zarr is a two-step process:
25+
26+ 1 . Convert ({ref}` explode<cmd-vcf2zarr-explode> ` ) VCF file(s) to
27+ Intermediate Columnar Format (ICF)
28+ 2 . Convert ({ref}` encode<cmd-vcf2zarr-encode> ` ) ICF to Zarr
29+
30+ This two-step process allows ` vcf2zarr ` to determine the correct
31+ dimension of Zarr arrays corresponding to each VCF field, and
32+ to keep memory usage tightly bounded while writing the arrays.
33+
34+ :::{important}
35+ The intermediate columnar format is not intended for any use
36+ other than a temporary storage while converting VCF to Zarr.
37+ The format may change between versions of ` bio2zarr ` .
38+ :::
39+
40+
41+ ## Common options
542
643```
744$ vcf2zarr convert <VCF1> <VCF2> <zarr>
Original file line number Diff line number Diff line change @@ -9,6 +9,7 @@ kernelspec:
99 language : bash
1010 name : bash
1111---
12+ (sec-vcf2zarr-tutorial)=
1213# Tutorial
1314
1415This is a step-by-step tutorial showing you how to convert your
Original file line number Diff line number Diff line change 1+ (sec-vcfpartition)=
12# vcfpartition
23
34## Overview
You can’t perform that action at this time.
0 commit comments