Skip to content

Commit b13203b

Browse files
Merge pull request #4 from jeromekelleher/readme-update
Initial README
2 parents c455d3c + 86ec133 commit b13203b

File tree

1 file changed

+50
-1
lines changed

1 file changed

+50
-1
lines changed

README.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,53 @@
11
# bio2zarr
22
Convert bioinformatics file formats to Zarr
33

4-
**This is early alpha-status code: DO NOT USE!!**
4+
Initially supports converting VCF to the
5+
[sgkit vcf-zarr specification](https://github.com/pystatgen/vcf-zarr-spec/)
6+
7+
**This is early alpha-status code: everything is subject to change, a
8+
and it has not been thoroughly tested**
9+
10+
## Usage
11+
12+
Convert a VCF to zarr format:
13+
14+
```
15+
python3 -m bio2zarr vcf2zarr convert <VCF> <zarr>
16+
```
17+
18+
Converts the VCF to zarr format.
19+
20+
**Do not use this for anything but the smallest files**
21+
22+
The recommended approach is to use a multi-stage conversion
23+
24+
First, convert the VCF into an intermediate columnar format:
25+
26+
```
27+
python3 -m bio2zarr vcf2zarr convert tests/data/vcf/sample.vcf.gz tmp/sample.exploded
28+
```
29+
30+
Then, (optionally) inspect this representation to get a feel for your dataset
31+
```
32+
python3 -m bio2zarr vcf2zarr summarise tmp/sample.exploded
33+
```
34+
35+
Then, (optionally) generate a conversion schema to describe the corresponding
36+
Zarr arrays:
37+
38+
```
39+
python3 -m bio2zarr vcf2zarr genspec tmp/sample.exploded > sample.schema.json
40+
```
41+
42+
View and edit the schema, deleting any columns you don't want.
43+
44+
Finally, convert to Zarr
45+
46+
```
47+
python3 -m bio2zarr vcf2zarr to-zarr tmp/sample.exploded tmp/sample.zarr -s sample.schema.json
48+
```
49+
50+
Use the ``-p, --worker-processes`` argument to control the number of workers used
51+
to do zarr encoding.
52+
53+

0 commit comments

Comments
 (0)