Skip to content

Commit e33a069

Browse files
First pass at documenting tskit + Python API
1 parent b1f33d3 commit e33a069

File tree

7 files changed

+84
-6
lines changed

7 files changed

+84
-6
lines changed

bio2zarr/tskit.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,8 @@ def generate_schema(
241241

242242

243243
def convert(
244-
ts_path,
245-
zarr_path,
244+
ts_or_path,
245+
vcz_path,
246246
*,
247247
model_mapping=None,
248248
contig_id=None,
@@ -252,8 +252,14 @@ def convert(
252252
worker_processes=1,
253253
show_progress=False,
254254
):
255+
"""
256+
Convert a :class:`tskit.TreeSequence` (or path to a tree sequence
257+
file) to VCF Zarr format stored at the specified path.
258+
259+
.. todo:: Document parameters
260+
"""
255261
tskit_format = TskitFormat(
256-
ts_path,
262+
ts_or_path,
257263
model_mapping=model_mapping,
258264
contig_id=contig_id,
259265
isolated_as_missing=isolated_as_missing,
@@ -262,7 +268,7 @@ def convert(
262268
variants_chunk_size=variants_chunk_size,
263269
samples_chunk_size=samples_chunk_size,
264270
)
265-
zarr_path = pathlib.Path(zarr_path)
271+
zarr_path = pathlib.Path(vcz_path)
266272
vzw = vcz.VcfZarrWriter(TskitFormat, zarr_path)
267273
# Rough heuristic to split work up enough to keep utilisation high
268274
target_num_partitions = max(1, worker_processes * 4)

docs/_config.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,15 @@ html:
2424
extra_footer: |
2525
<p>
2626
Documentation available under the terms of the
27-
<a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0 1.0</a>
27+
<a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0 1.0</a>
2828
license.
2929
</p>
3030
3131
sphinx:
3232
extra_extensions:
3333
- sphinx_click.ext
3434
- sphinx.ext.todo
35+
- sphinx.ext.autodoc
3536
config:
3637
html_show_copyright: false
3738
# This is needed to make sure that text is output in single block from
@@ -40,3 +41,6 @@ sphinx:
4041
todo_include_todos: true
4142
myst_enable_extensions:
4243
- colon_fence
44+
intersphinx_mapping:
45+
python: ["https://docs.python.org/3/", null]
46+
tskit: ["https://tskit.dev/tskit/docs/stable", null]

docs/_toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ chapters:
99
- file: plink2zarr/overview
1010
sections:
1111
- file: plink2zarr/cli_ref
12+
- file: tskit2zarr/overview
13+
sections:
14+
- file: tskit2zarr/python_api
15+
- file: tskit2zarr/cli_ref
1216
- file: vcfpartition/overview
1317
sections:
1418
- file: vcfpartition/cli_ref

docs/plink2zarr/cli_ref.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@
1414
.. click:: bio2zarr.cli:convert_plink
1515
:prog: plink2zarr convert
1616
:nested: full
17-
17+
```

docs/tskit2zarr/cli_ref.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
(sec-tskit2zarr-cli-ref)=
2+
# CLI Reference
3+
4+
% A note on cross references... There's some weird long-standing problem with
5+
% cross referencing program values in Sphinx, which means that we can't use
6+
% the built-in labels generated by sphinx-click. We can make our own explicit
7+
% targets, but these have to have slightly weird names to avoid conflicting
8+
% with what sphinx-click is doing. So, hence the cmd- prefix.
9+
% Based on: https://github.com/skypilot-org/skypilot/pull/2834
10+
11+
```{eval-rst}
12+
13+
.. _cmd-tskit2zarr-convert:
14+
.. click:: bio2zarr.cli:convert_tskit
15+
:prog: tskit2zarr convert
16+
:nested: full
17+
18+
```

docs/tskit2zarr/overview.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
(sec-tskit2zarr)=
2+
# tskit2zarr
3+
4+
Convert tskit data to the
5+
[VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/)
6+
reliably in parallel.
7+
8+
See {ref}`sec-tskit2zarr-cli-ref` for detailed documentation on
9+
command line options.
10+

docs/tskit2zarr/python_api.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
(sec-tskit2zarr-python-api)=
2+
# Python API
3+
4+
Basic usage:
5+
```python
6+
import bio2zarr.tskit as ts2z
7+
8+
ts2z.convert(ts_path, vcz_path, worker_processes=8)
9+
```
10+
11+
This will convert the [tskit](https://tskit.dev) tree sequence stored
12+
at ``ts_path`` to VCF Zarr stored at ``vcz_path`` using 8 worker processes.
13+
The details of how we map from the
14+
tskit {ref}`tskit:sec_data_model` to VCF Zarr are taken care of by
15+
TreeSequence.map_to_vcf_model method, which is called with no
16+
parameters by default if the ``model_mapping`` parameter to
17+
{func}`~bio2zarr.tskit.convert` is not specified.
18+
19+
For more control over the properties of the output, for example
20+
to pick a specific subset of individuals, you can use
21+
TreeSequence.map_to_vcf_model
22+
to return the required mapping:
23+
24+
```python
25+
model_mapping = ts.map_vcf_model(individuals=[0, 1])
26+
ts2z.convert(ts, vcz_path, model_mapping=model_mapping)
27+
```
28+
29+
30+
## API reference
31+
32+
```{eval-rst}
33+
34+
.. autofunction:: bio2zarr.tskit.convert
35+
36+
```

0 commit comments

Comments
 (0)