@@ -9,6 +9,8 @@ See the {ref}`sec-vcf2zarr-tutorial` for a step-by-step introduction
9
9
and the {ref}` sec-vcf2zarr-cli-ref ` detailed documentation on
10
10
command line options.
11
11
12
+ See the [ bioRxiv preprint] ( https://www.biorxiv.org/content/10.1101/2024.06.11.598241 ) for
13
+ further details.
12
14
13
15
## Quickstart
14
16
@@ -43,13 +45,18 @@ vcf2zarr inspect sample.vcz
43
45
### What next?
44
46
45
47
VCF Zarr is a starting point in what we hope will become a diverse ecosytem
46
- of packages that efficiently process VCF data in Zarr format. However, this
47
- ecosytem does not exist yet, and there isn't much software available
48
- for working with the format. As such, VCF Zarr isn't suitable for end users
49
- who just want to get their work done for the moment.
48
+ of packages that efficiently process VCF data in Zarr format. This
49
+ ecosytem is in its infancy and there isn't much software available
50
+ for performing off-the-shelf bioinformatics tasks
51
+ working with the format. As such, VCF Zarr isn't suitable for end users
52
+ who just want to get their work done for the moment, and is currently
53
+ aimed methods developers and early adopters.
50
54
51
55
Having said that, you can:
52
56
57
+ - Use [ vcztools] ( https://github.com/sgkit-dev/vcztools/ ) as a drop-in replacment
58
+ for bcftools, transparently using Zarr on local storage or cloud stores as the
59
+ backend.
53
60
- Look at the [ VCF Zarr specification] ( https://github.com/sgkit-dev/vcf-zarr-spec/ )
54
61
to see how data is mapped from VCF to Zarr
55
62
- Use the mature [ Zarr Python] ( https://zarr.readthedocs.io/en/stable/ ) package or
@@ -59,6 +66,9 @@ your data.
59
66
sister project to analyse the data. Note that sgkit is under active development,
60
67
however, and the documentation may not be fully in-sync with this project.
61
68
69
+ For more information, please see our
70
+ bioRxiv preprint [ Analysis-ready VCF at Biobank scale using Zarr] (
71
+ https://www.biorxiv.org/content/10.1101/2024.06.11.598241 ).
62
72
63
73
64
74
## How does it work?
@@ -83,6 +93,42 @@ across cores on a single machine (via the ``--worker-processes`` argument)
83
93
or distributed across a cluster by the three-part `` init `` , `` partition ``
84
94
and `` finalise `` commands.
85
95
96
+ ## Local alleles
97
+
98
+ As discussed in our [ preprint] (
99
+ https://www.biorxiv.org/content/10.1101/2024.06.11.598241 )
100
+ vcf2zarr has an experimental implementation of the local alleles data
101
+ reduction technique. This essentially reduces the inner dimension of
102
+ large fields such as AD by storing information relevant only to the alleles
103
+ involved in a particular variant call, rather than information information
104
+ for all alleles. This can make a substantial difference when there is a large
105
+ number of alleles.
106
+
107
+ To use local alleles, you must generate storage a schema (see the
108
+ {ref}` sec-vcf2zarr-tutorial-medium-dataset ` section of the tutorial)
109
+ using the {ref}` mkschema<cmd-vcf2zarr-mkschema> ` command with the
110
+ `` --local-alleles `` option. This will generate the `` call_LA `` field
111
+ which lists the alleles observed for each genotype call, and
112
+ translate supported fields from their global alleles to local
113
+ alleles representation.
114
+
115
+ :::{warning}
116
+ Support for local-alleles is preliminary and may be subject to change
117
+ as the details of how alleles for a particular call are chosen, and the
118
+ number of alleles retained determined. Please open an issue on
119
+ [ GitHub] ( https://github.com/sgkit-dev/bio2zarr/issues/ ) if you would like to
120
+ help improve Bio2zarr's local alleles implementation.
121
+ :::
122
+
123
+ :::{note}
124
+ Only the PL and AD fields are currently supported for local alleles
125
+ data reduction. Please comment on our
126
+ [ local alleles fields tracking issue] ( https://github.com/sgkit-dev/bio2zarr/issues/315 )
127
+ if you would like to see other fields supported, or to help out with
128
+ implementing more.
129
+ :::
130
+
131
+
86
132
## Copying to object stores
87
133
88
134
:::{todo}
0 commit comments