File tree Expand file tree Collapse file tree 1 file changed +50
-1
lines changed Expand file tree Collapse file tree 1 file changed +50
-1
lines changed Original file line number Diff line number Diff line change 1
1
# bio2zarr
2
2
Convert bioinformatics file formats to Zarr
3
3
4
- ** This is early alpha-status code: DO NOT USE!!**
4
+ Initially supports converting VCF to the
5
+ [ sgkit vcf-zarr specification] ( https://github.com/pystatgen/vcf-zarr-spec/ )
6
+
7
+ ** This is early alpha-status code: everything is subject to change, a
8
+ and it has not been thoroughly tested**
9
+
10
+ ## Usage
11
+
12
+ Convert a VCF to zarr format:
13
+
14
+ ```
15
+ python3 -m bio2zarr vcf2zarr convert <VCF> <zarr>
16
+ ```
17
+
18
+ Converts the VCF to zarr format.
19
+
20
+ ** Do not use this for anything but the smallest files**
21
+
22
+ The recommended approach is to use a multi-stage conversion
23
+
24
+ First, convert the VCF into an intermediate columnar format:
25
+
26
+ ```
27
+ python3 -m bio2zarr vcf2zarr convert tests/data/vcf/sample.vcf.gz tmp/sample.exploded
28
+ ```
29
+
30
+ Then, (optionally) inspect this representation to get a feel for your dataset
31
+ ```
32
+ python3 -m bio2zarr vcf2zarr summarise tmp/sample.exploded
33
+ ```
34
+
35
+ Then, (optionally) generate a conversion schema to describe the corresponding
36
+ Zarr arrays:
37
+
38
+ ```
39
+ python3 -m bio2zarr vcf2zarr genspec tmp/sample.exploded > sample.schema.json
40
+ ```
41
+
42
+ View and edit the schema, deleting any columns you don't want.
43
+
44
+ Finally, convert to Zarr
45
+
46
+ ```
47
+ python3 -m bio2zarr vcf2zarr to-zarr tmp/sample.exploded tmp/sample.zarr -s sample.schema.json
48
+ ```
49
+
50
+ Use the `` -p, --worker-processes `` argument to control the number of workers used
51
+ to do zarr encoding.
52
+
53
+
You can’t perform that action at this time.
0 commit comments