Skip to content

Commit db95518

Browse files
Merge pull request #569 from benjeffery/docs
Add doc build
2 parents 21599cb + 08181f2 commit db95518

File tree

16 files changed

+500
-13
lines changed

16 files changed

+500
-13
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ alignments and metadata stored in
6767

6868
Resources:
6969

70-
- See this [notebook](https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb)
70+
- See this [notebook](https://github.com/tskit-dev/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb)
7171
for an example in which we access the data variant-by-variant and
7272
which explains the low-level data encoding
7373
- See the [VCF Zarr publication](https://doi.org/10.1093/gigascience/giaf049)

docs/Makefile

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Need to set PYTHONPATH so that we pick up the local sc2ts
2+
PYPATH=${PWD}/..
3+
SC2TS_VERSION:=$(shell PYTHONPATH=${PYPATH} \
4+
python3 -c 'import sc2ts; print(sc2ts.__version__.split("+")[0])')
5+
6+
dev:
7+
PYTHONPATH=${PYPATH} ./build.sh
8+
9+
dist:
10+
@echo Building distribution for sc2ts version ${SC2TS_VERSION}
11+
sed -i s/__SC2TS_VERSION__/${SC2TS_VERSION}/g _config.yml
12+
PYTHONPATH=${PYPATH} ./build.sh
13+
14+
clean:
15+
rm -fR _build
16+

docs/_config.yml

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Book settings
2+
# Learn more at https://jupyterbook.org/customize/config.html
3+
4+
title: sc2ts manual
5+
author: sc2ts developers
6+
logo: sc2ts.png
7+
copyright: "2024"
8+
only_build_toc_files: true
9+
10+
execute:
11+
execute_notebooks: cache
12+
13+
launch_buttons:
14+
binderhub_url: ""
15+
16+
repository:
17+
url: https://github.com/tskit-dev/sc2ts
18+
branch: main
19+
path_to_book: docs
20+
21+
html:
22+
favicon: sc2ts.png
23+
use_issues_button: true
24+
use_repository_button: true
25+
use_edit_page_button: true
26+
27+
sphinx:
28+
extra_extensions:
29+
- sphinx_copybutton
30+
- sphinx.ext.autodoc
31+
- sphinx.ext.autosummary
32+
- sphinx.ext.todo
33+
- sphinx.ext.viewcode
34+
- sphinx.ext.intersphinx
35+
- sphinx_issues
36+
- sphinxarg.ext
37+
- IPython.sphinxext.ipython_console_highlighting
38+
- sphinx_click.ext
39+
40+
config:
41+
html_theme: sphinx_book_theme
42+
html_theme_options:
43+
navigation_with_keys: false
44+
pygments_dark_style: monokai
45+
logo:
46+
text: "Version __SC2TS_VERSION__"
47+
48+
myst_enable_extensions:
49+
- colon_fence
50+
- deflist
51+
- substitution
52+
53+
issues_github_path: tskit-dev/sc2ts
54+
todo_include_todos: true
55+
56+
intersphinx_mapping:
57+
python: ["https://docs.python.org/3/", null]
58+
tskit: ["https://tskit.dev/tskit/docs/stable", null]
59+
tutorials: ["https://tskit.dev/tutorials/", null]
60+
numpy: ["https://numpy.org/doc/stable/", null]
61+
pandas: ["https://pandas.pydata.org/docs/", null]
62+
63+
nitpicky: true
64+
65+
autodoc_member_order: bysource
66+
autodoc_typehints: none
67+
68+
myst_substitutions:
69+
min_python_version: "3.10"

docs/_toc.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
format: jb-book
2+
root: intro
3+
parts:
4+
- caption: Interfaces
5+
chapters:
6+
- file: cli
7+
- file: api

docs/api.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Python API
2+
3+
This page documents the public Python API exposed by ``sc2ts``.
4+
Inference is driven via the command line interface (see the
5+
{ref}`CLI documentation <sc2ts_sec_cli>`); the functions and classes
6+
listed here are intended for working with tree sequences and datasets
7+
that have already been generated.
8+
9+
The reference documentation is concise and exhaustive; for higher level
10+
discussion and worked examples, see the project README and example
11+
notebooks.
12+
13+
```{eval-rst}
14+
.. currentmodule:: sc2ts
15+
```
16+
17+
## ARG analysis
18+
19+
```{eval-rst}
20+
.. autosummary::
21+
node_data
22+
mutation_data
23+
```
24+
25+
```{eval-rst}
26+
.. autofunction:: node_data
27+
28+
.. autofunction:: mutation_data
29+
```
30+
31+
## Dataset access
32+
33+
```{eval-rst}
34+
.. autosummary::
35+
Dataset
36+
decode_alignment
37+
mask_ambiguous
38+
mask_flanking_deletions
39+
```
40+
41+
```{eval-rst}
42+
.. autoclass:: Dataset
43+
:members:
44+
45+
.. autofunction:: decode_alignment
46+
47+
.. autofunction:: mask_ambiguous
48+
49+
.. autofunction:: mask_flanking_deletions
50+
```
51+
52+
## Core constants and helpers
53+
54+
```{eval-rst}
55+
.. autosummary::
56+
REFERENCE_STRAIN
57+
REFERENCE_DATE
58+
REFERENCE_GENBANK
59+
REFERENCE_SEQUENCE_LENGTH
60+
IUPAC_ALLELES
61+
decode_flags
62+
flags_summary
63+
```
64+
65+
```{eval-rst}
66+
.. autodata:: REFERENCE_STRAIN
67+
68+
.. autodata:: REFERENCE_DATE
69+
70+
.. autodata:: REFERENCE_GENBANK
71+
72+
.. autodata:: REFERENCE_SEQUENCE_LENGTH
73+
74+
.. autodata:: IUPAC_ALLELES
75+
76+
.. autofunction:: decode_flags
77+
78+
.. autofunction:: flags_summary
79+
```

docs/build.sh

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#/bin/bash
2+
3+
# Jupyter-build doesn't have an option to automatically show the
4+
# saved reports, which makes it difficult to debug the reasons for
5+
# build failures in CI. This is a simple wrapper to handle that.
6+
7+
REPORTDIR=_build/html/reports
8+
9+
jupyter-book build .
10+
RETVAL=$?
11+
if [ $RETVAL -ne 0 ]; then
12+
if [ -e $REPORTDIR ]; then
13+
echo "Error occured; showing saved reports"
14+
cat $REPORTDIR/*
15+
fi
16+
else
17+
# Clear out any old reports
18+
rm -f $REPORTDIR/*
19+
fi
20+
exit $RETVAL
21+

docs/cli.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
.. _sc2ts_sec_cli:
2+
3+
Command line interface
4+
======================
5+
6+
The ``sc2ts`` package provides a command line interface for running
7+
inference and working with sc2ts datasets. After installation, the
8+
``sc2ts`` entry point should be available::
9+
10+
$ sc2ts --help
11+
12+
You can also invoke the CLI via the module::
13+
14+
$ python -m sc2ts --help
15+
16+
Order of high-level commands
17+
----------------------------
18+
19+
In a typical end-to-end workflow, the main subcommands are used in the
20+
following order:
21+
22+
1. ``import-alignments`` and ``import-metadata`` to build a VCF Zarr
23+
dataset from raw alignments and metadata.
24+
2. ``infer`` to run primary inference over the dataset and produce a
25+
series of tree sequence files and a match database.
26+
3. ``postprocess`` to apply housekeeping steps and incorporate exact
27+
matches, outputting a cleaned ARG.
28+
4. ``minimise-metadata`` to generate an analysis-ready ARG with compact
29+
metadata suitable for use with the Python analysis APIs.
30+
31+
Below we list all subcommands and options provided by the CLI. This
32+
output is generated directly from the Click definitions in
33+
``sc2ts.cli`` using the ``sphinx-click`` extension, and so stays in
34+
sync with the implementation.
35+
36+
.. click:: sc2ts.cli:cli
37+
:prog: sc2ts
38+
:nested: full

docs/intro.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# sc2ts
2+
3+
`sc2ts` (SARS-CoV-2 to tree sequence, pronounced "scoots") provides tools
4+
to infer and analyse tskit ancestral recombination graphs (ARGs) for SARS-CoV-2
5+
at pandemic scale.
6+
It consists of:
7+
8+
1. A CLI-driven method to infer ARGs from SARS-CoV-2 data.
9+
2. A lightweight wrapper around the :mod:`tskit` Python APIs, specialised
10+
for the output of sc2ts and enabling efficient node metadata access.
11+
3. A lightweight wrapper around :mod:`zarr` for convenient access to the
12+
Viridian dataset (alignments and metadata) in VCF Zarr format.
13+
14+
The underlying methods are described in the sc2ts pre-print:
15+
<https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2>.
16+
17+
Most users will run sc2ts via the command line interface,
18+
which drives inference and postprocessing steps (see the
19+
{ref}`CLI documentation <sc2ts_sec_cli>`). The Python API is intended for
20+
working with tree sequences and datasets produced by sc2ts (see the
21+
{ref}`Python API reference <api>`).
22+
23+
For an overview and examples, see the project README and associated
24+
notebooks in the repository root.
25+
26+
## Installation
27+
28+
Install sc2ts from PyPI:
29+
30+
```sh
31+
python -m pip install sc2ts
32+
```
33+
34+
This installs the minimal requirements for the analysis and dataset APIs.
35+
To run inference from the command line, install the optional inference
36+
dependencies:
37+
38+
```sh
39+
python -m pip install 'sc2ts[inference]'
40+
```
41+
42+
## Quick start: ARG analysis
43+
44+
To compute summary dataframes for nodes and mutations in an inferred ARG,
45+
you can load an sc2ts tree sequence and call the analysis helpers. For
46+
example, download the sc2ts paper ARG from Zenodo:
47+
48+
```sh
49+
curl -O https://zenodo.org/records/17558489/files/sc2ts_viridian_v1.2.trees.tsz
50+
```
51+
52+
and then:
53+
54+
```python
55+
import sc2ts
56+
import tszip
57+
58+
ts = tszip.load("sc2ts_viridian_v1.2.trees.tsz")
59+
df_node = sc2ts.node_data(ts)
60+
df_mutation = sc2ts.mutation_data(ts)
61+
```
62+
63+
See the {ref}`Python API reference <api>` for full details of these
64+
functions.
65+
66+
## Quick start: CLI inference
67+
68+
To run inference locally using the example Viridian dataset and config:
69+
70+
1. Install the inference extras (if you have not already):
71+
72+
```sh
73+
python -m pip install 'sc2ts[inference]'
74+
```
75+
76+
2. Download the Viridian dataset in VCF Zarr format:
77+
78+
```sh
79+
curl -O https://zenodo.org/records/16314739/files/viridian_mafft_2024-10-14_v1.vcz.zip
80+
```
81+
82+
3. Run primary inference using the CLI and the example config in this repo:
83+
84+
```sh
85+
python -m sc2ts infer example_config.toml --stop=2020-02-02
86+
```
87+
88+
This will produce a series of `.ts` files and a match database in the
89+
output directory specified by the config (see the README for details).
90+
91+
4. Postprocess and generate an analysis-ready ARG:
92+
93+
```sh
94+
python -m sc2ts postprocess -vv \
95+
--match-db example_inference/ex1.matches.db \
96+
example_inference/ex1/ex1_2020-02-01.ts \
97+
example_inference/ex1_2020-02-01_pp.ts
98+
99+
python -m sc2ts minimise-metadata \
100+
-m strain sample_id \
101+
-m Viridian_pangolin pango \
102+
example_inference/ex1_2020-02-01_pp.ts \
103+
example_inference/ex1_2020-02-01_pp_mm.ts
104+
```
105+
106+
The file `example_inference/ex1_2020-02-01_pp_mm.ts` can then be used
107+
with the Python analysis APIs shown above.
108+
109+
See the {ref}`CLI documentation <sc2ts_sec_cli>` for a complete listing of
110+
subcommands and options.

docs/sc2ts.png

495 KB
Loading

pyproject.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,15 @@ debug = [
4343
"matplotlib",
4444
"IPython",
4545
]
46+
docs = [
47+
"jupyter-book==1.0.4.post1",
48+
"sphinx-book-theme",
49+
"sphinx-copybutton",
50+
"sphinx-click",
51+
"sphinx-argparse==0.5.2",
52+
"sphinx-issues==5.0.1",
53+
"IPython",
54+
]
4655

4756
[build-system]
4857
requires = [

0 commit comments

Comments
 (0)