@@ -18,6 +18,17 @@ convert your data, basically providing different levels of
1818convenience and flexibility corresponding to what you might
1919need for small, intermediate and large datasets.
2020
21+ :::{warning}
22+ The documentation of vcf2zarr is under development, and
23+ some bits are more polished than others. This "tutorial"
24+ is experimental, and will likely evolve into a slightly
25+ different format in the near future. It is
26+ a work in progress and incomplete. The
27+ {ref}` sec-vcf2zarr-cli-ref ` should be complete
28+ and authoritative, however.
29+ :::
30+
31+
2132## Small dataset
2233
2334The simplest way to convert VCF data to Zarr is to use the
@@ -229,11 +240,33 @@ granularity). You should be careful to use this value in your scripts
229240
230241
231242Once `` dexplode-init `` is done and we know how many partitions we have,
232- we need to call `` dexplode-partition `` this number of times.
243+ we need to call
244+ {ref}` dexplode-partition<cmd-vcf2zarr-dexplode-partition> ` this number of times:
233245
234246``` {code-cell}
235247vcf2zarr dexplode-partition sample-dist.icf 0
236248vcf2zarr dexplode-partition sample-dist.icf 1
237249vcf2zarr dexplode-partition sample-dist.icf 2
238250```
239251
252+ This is not how it would be done in practise of course: you would
253+ use your cluster scheduler of choice to dispatch these operations.
254+ :::{todo}
255+ Document how to do this conveniently over some popular schedulers.
256+ :::
257+
258+ :::{tip}
259+ Use the `` --one-based `` argument in cases in which it's more convenient
260+ to index the partitions from 1 to n, rather than 0 to n - 1.
261+ :::
262+
263+ Finally we need to call
264+ {ref}` dexplode-finalise<cmd-vcf2zarr-dexplode-finalise> ` :
265+ ``` {code-cell}
266+ vcf2zarr dexplode-finalise sample-dist.icf
267+ ```
268+
269+ :::{todo}
270+ Document the process for dencode, noting the information output about
271+ memory requirements.
272+ :::
0 commit comments