Skip to content

Commit 65efc3e

Browse files
Docs rejig
1 parent 7760c57 commit 65efc3e

File tree

4 files changed

+37
-77
lines changed

4 files changed

+37
-77
lines changed

docs/_toc.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
format: jb-book
22
root: intro
33
chapters:
4-
- file: vcf2zarr_tutorial
4+
- file: installation
5+
- file: vcf2zarr
56
- file: cli

docs/installation.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Installation
2+
3+
4+
```
5+
$ python3 -m pip install bio2zarr
6+
```
7+
8+
This will install the programs ``vcf2zarr``, ``plink2zarr`` and ``vcf_partition``
9+
into your local Python path. You may need to update your $PATH to call the
10+
executables directly.
11+
12+
Alternatively, calling
13+
```
14+
$ python3 -m bio2zarr vcf2zarr <args>
15+
```
16+
is equivalent to
17+
18+
```
19+
$ vcf2zarr <args>
20+
```
21+
and will always work.
22+

docs/intro.md

Lines changed: 5 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,9 @@
1-
# bio2zarr Documentation
1+
# bio2zarr
22

3-
`bio2zarr` efficiently converts common bioinformatics formats to
4-
[Zarr](https://zarr.readthedocs.io/en/stable/) format. Initially supporting converting
5-
VCF to the [sgkit vcf-zarr specification](https://github.com/pystatgen/vcf-zarr-spec/).
3+
`bio2zarr` efficiently converts common bioinformatics formats to
4+
[Zarr](https://zarr.readthedocs.io/en/stable/) format. Initially supporting converting
5+
VCF to the [VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/).
66

7-
`bio2zarr` is in early alpha development, contributions, feedback and issues are welcome
7+
`bio2zarr` is in development, contributions, feedback and issues are welcome
88
at the [GitHub repository](https://github.com/sgkit-dev/bio2zarr).
99

10-
## Installation
11-
`bio2zarr` can be installed from PyPI using pip:
12-
13-
```bash
14-
$ python3 -m pip install bio2zarr
15-
```
16-
17-
This will install the programs ``vcf2zarr``, ``plink2zarr`` and ``vcf_partition``
18-
into your local Python path. You may need to update your $PATH to call the
19-
executables directly.
20-
21-
Alternatively, calling
22-
```
23-
$ python3 -m bio2zarr vcf2zarr <args>
24-
```
25-
is equivalent to
26-
27-
```
28-
$ vcf2zarr <args>
29-
```
30-
and will always work.
31-
32-
## Basic vcf2zarr usage
33-
For modest VCF files (up to a few GB), a single command can be used to convert a VCF file
34-
(or set of VCF files) using the {ref}`convert<cmd-vcf2zarr-convert>` command:
35-
36-
```bash
37-
$ vcf2zarr convert <VCF1> <VCF2> ... <VCFN> <zarr>
38-
```
39-
40-
For larger files a multi-step process is recommended.
41-
42-
43-
First, convert the VCF into the intermediate format:
44-
45-
```bash
46-
$ vcf2zarr explode tests/data/vcf/sample.vcf.gz tmp/sample.exploded
47-
```
48-
49-
Then, (optionally) inspect this representation to get a feel for your dataset
50-
```bash
51-
$ vcf2zarr inspect tmp/sample.exploded
52-
```
53-
54-
Then, (optionally) generate a conversion schema to describe the corresponding
55-
Zarr arrays:
56-
57-
```bash
58-
$ vcf2zarr mkschema tmp/sample.exploded > sample.schema.json
59-
```
60-
61-
View and edit the schema, deleting any columns you don't want, or tweaking
62-
dtypes and compression settings to your taste.
63-
64-
Finally, encode to Zarr:
65-
```bash
66-
$ vcf2zarr encode tmp/sample.exploded tmp/sample.zarr -s sample.schema.json
67-
```
68-
69-
Use the ``-p, --worker-processes`` argument to control the number of workers used
70-
in the ``explode`` and ``encode`` phases.
71-
72-
73-
74-
75-
```{tableofcontents}
76-
```

docs/vcf2zarr_tutorial.md renamed to docs/vcf2zarr.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,19 @@ kernelspec:
99
language: bash
1010
name: bash
1111
---
12-
# Vcf2zarr tutorial
12+
# vcf2zarr
13+
14+
15+
16+
## Tutorial
1317

1418
This is a step-by-step tutorial showing you how to convert your
1519
VCF data into Zarr format. There's three different ways to
1620
convert your data, basically providing different levels of
1721
convenience and flexibility corresponding to what you might
1822
need for small, intermediate and large datasets.
1923

20-
## Small
24+
### Small
2125

2226
<!-- ```{code-cell} bash -->
2327
<!-- vcf2zarr convert ../tests/data/vcf/sample.vcf.gz sample.zarr -vf -->
@@ -32,6 +36,6 @@ need for small, intermediate and large datasets.
3236
});
3337
</script>
3438

35-
## Intermediate
39+
### Intermediate
3640

37-
## Large
41+
### Large

0 commit comments

Comments
 (0)