Skip to content

Commit e0ee253

Browse files
Merge pull request #586 from tskit-dev/final-docs-fixes-pre1
Final docs fixes pre1
2 parents 720ca5e + 3c17ff2 commit e0ee253

File tree

6 files changed

+145
-25
lines changed

6 files changed

+145
-25
lines changed

docs/cli.md

Lines changed: 75 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,19 +33,84 @@ following order:
3333

3434
## CLI reference
3535

36-
<!-- Below we list all subcommands and options provided by the CLI. This -->
37-
<!-- output is generated directly from the Click definitions in -->
38-
<!-- ``sc2ts.cli`` using the ``sphinx-click`` extension, and so stays in -->
39-
<!-- sync with the implementation. -->
36+
% A note on cross references... There's some weird long-standing problem with
37+
% cross referencing program values in Sphinx, which means that we can't use
38+
% the built-in labels generated by sphinx-click. We can make our own explicit
39+
% targets, but these have to have slightly weird names to avoid conflicting
40+
% with what sphinx-click is doing. So, hence the cmd- prefix.
41+
% Based on: https://github.com/skypilot-org/skypilot/pull/2834
4042

41-
:::{todo}
42-
Add the sphinx-click output here somehow.
43-
:::
43+
### Data import
44+
45+
```{eval-rst}
46+
.. _cmd-sc2ts-import-alignments:
47+
.. click:: sc2ts.cli:import_alignments
48+
:prog: sc2ts import-alignments
49+
```
50+
51+
```{eval-rst}
52+
.. _cmd-sc2ts-import-metadata:
53+
.. click:: sc2ts.cli:import_metadata
54+
:prog: sc2ts import-metadata
55+
```
56+
57+
### Inference
58+
59+
```{eval-rst}
60+
.. _cmd-sc2ts-infer:
61+
.. click:: sc2ts.cli:infer
62+
:prog: sc2ts infer
63+
```
64+
65+
### Inspection
66+
67+
```{eval-rst}
68+
.. _cmd-sc2ts-info-dataset:
69+
.. click:: sc2ts.cli:info_dataset
70+
:prog: sc2ts info-dataset
71+
```
72+
73+
```{eval-rst}
74+
.. _cmd-sc2ts-info-matches:
75+
.. click:: sc2ts.cli:info_matches
76+
:prog: sc2ts info-matches
77+
```
78+
79+
### Postprocessing
80+
81+
```{eval-rst}
82+
.. _cmd-sc2ts-postprocess:
83+
.. click:: sc2ts.cli:postprocess
84+
:prog: sc2ts postprocess
85+
```
86+
87+
```{eval-rst}
88+
.. _cmd-sc2ts-map-parsimony:
89+
.. click:: sc2ts.cli:map_parsimony
90+
:prog: sc2ts map-parsimony
91+
```
92+
93+
```{eval-rst}
94+
.. _cmd-sc2ts-minimise-metadata:
95+
.. click:: sc2ts.cli:minimise_metadata
96+
:prog: sc2ts minimise-metadata
97+
```
98+
99+
### Miscellaneous
100+
101+
% For some reason this one isn't working. Not worth worrying about.
44102

45103
<!-- ```{eval-rst} -->
46-
<!-- .. click:: sc2ts.cli:cli -->
47-
<!-- :prog: sc2ts infer -->
48-
<!-- :nested: full -->
104+
<!-- .. _cmd-sc2ts-validate: -->
105+
<!-- .. click:: sc2ts.cli:validate -->
106+
<!-- :prog: sc2ts validate -->
49107
<!-- ``` -->
50108

51109

110+
```{eval-rst}
111+
.. _cmd-sc2ts-run-hmm:
112+
.. click:: sc2ts.cli:run_hmm
113+
:prog: sc2ts run-hmm
114+
```
115+
116+

docs/example_config.toml

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
1+
2+
# This is a path to the dataset, in VCZ format.
13
dataset="viridian_mafft_2024-10-14_v1.vcz.zip"
4+
# The metadata field used for dates. For the Viridian dataset, this is
5+
# "Date_tree" (which means, "date used to partition samples when building
6+
# the Viridian tree")
27
date_field="Date_tree"
38

9+
# The run_id is a prefix added to all output files. This is useful when
10+
# running lots of different parameter combinations.
411
run_id="ex1"
512
# Configure where the result files are stored. For simplicity
613
# we put them all in the "example_inference" directory.
@@ -13,8 +20,9 @@ matches_dir= "example_inference/"
1320
# This is full debug output, which is verbose (but useful!)
1421
log_level = 2
1522

16-
# Dates to exclude from inference. This one is a large outlier in terms of the
17-
# numbers of samples, and enriched for incorrectly assigned dates.
23+
# Dates to exclude from inference. This one is a large outlier in the
24+
# Viridian data in terms of the numbers of samples, and enriched for
25+
# incorrectly assigned dates.
1826
exclude_dates = ["2020-12-31"]
1927

2028
# The set of site positions to mask during inference (list of integers).
@@ -23,24 +31,49 @@ exclude_dates = ["2020-12-31"]
2331
exclude_sites = []
2432

2533
[extend_parameters]
34+
# The recombination penalty "k" parameter
2635
num_mismatches=4
36+
# Any samples with a HMM cost <= to this value are included in the ARG
2737
hmm_cost_threshold=7
38+
# The maximum number of missing sites for a sample to be considered
2839
max_missing_sites=500
40+
# Do we mask deletions as missing data?
2941
deletions_as_missing=true
42+
# The maximum number of samples to consider, per day
3043
# max_daily_samples=1000
3144

32-
# Knobs for tuning retro group insertion
45+
## Various knobs for tuning retro group insertion:
46+
47+
# The minimum number of samples in a retro group
3348
min_group_size=10
49+
# The minimum number of mutations shared by all samples
3450
min_root_mutations=2
51+
# The maxmimum number of recurrent mutations in the group tree
3552
max_recurrent_mutations=2
53+
# The maxmimum number of mutations per sample, overall
3654
max_mutations_per_sample=5
55+
# The size of the windown in which to consider samples for retrospective
56+
# inclusion, in days.
3757
retrospective_window=7
3858

59+
## Performance parameters.
60+
61+
# The number of matching threads to use. -1 means use all available cores.
62+
# Note that this will likely not make much difference until large numbers
63+
# of samples per days are involved.
3964
num_threads=-1
65+
# An approximate ceiling on the total amount of memory used (in GiB) by HMM
66+
# matching. Once the memory used goes above this value, new HMM match jobs are
67+
# held back until it goes under it again. If many memory intensive match jobs
68+
# are run at once however, this will not prevent them from exceeding this
69+
# limit.
4070
memory_limit=32
4171

72+
# A list of sample IDs (strings) for unconditional inclusion (e.g., to
73+
# help seed major saltation events).
4274
include_samples=[]
4375

76+
# Override specific parameter values over a time period.
4477
[[override]]
4578
start = "2020-01-01"
4679
stop = "2020-03-01"

docs/inference.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on a local machine using an example config file, using the Viridian data downloa
88
from Zenodo.
99

1010
Inference is performed using the CLI, which is composed of number of subcommands.
11-
See {ref}`sc2ts_sec_cli` section for more information
11+
See the {ref}`sc2ts_sec_cli` section for more information
1212

1313
## Prerequisites
1414

@@ -94,9 +94,17 @@ debugging metadata included (see the section on the Debug utilities below)
9494
Primary inference can be stopped and picked up again at any point using
9595
the ``--start`` option.
9696

97-
:::{todo}
98-
Add documentation for the toml config file
99-
:::
97+
<!-- :::{todo} -->
98+
<!-- Add documentation for the toml config file -->
99+
<!-- ::: -->
100+
### Config file format
101+
102+
All parameters for primary inference are specified using the [toml](https://toml.io/en/)
103+
config file. There are documented in the example config file used here:
104+
105+
```{literalinclude} example_config.toml
106+
:language: toml
107+
```
100108

101109
## Postprocessing
102110

docs/intro.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,20 @@ It consists of:
1111
3. A lightweight wrapper around [Zarr](https://zarr.dev) for convenient access to the
1212
Viridian dataset (alignments and metadata) in VCF Zarr format.
1313

14-
The underlying methods are described in the sc2ts [preprint](
14+
The methods are described in the sc2ts [preprint](
1515
<https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2>).
1616

17-
Most users will use the {ref}`sec_python_api` to perform {ref}`sec_arg_analysis`
18-
on the sc2ts inferred ARG or {ref}`sec_alignments_analysis` on the
19-
Zarr-formatted Viridian dataset distributed on Zenodo.
2017

21-
Uses who wish to perform {ref}`sec_inference` use the
22-
{ref}`sc2ts_sec_cli`.
18+
## Quickstart
19+
20+
- See the {ref}`sec_inference` section for an example of running
21+
primary inference using the {ref}`sc2ts_sec_cli`.
22+
23+
- See the {ref}`sec_arg_analysis` section for examples of using the
24+
{ref}`sec_python_api` to analyse the sc2ts Viridian ARG.
25+
26+
- See the {ref}`sec_alignments_analysis` section for examples
27+
of using the {ref}`sec_python_api` to analyse the Viridian
28+
alignments and metadata in
29+
[VCF Zarr format](https://doi.org/10.1093/gigascience/giaf049).
30+

pyproject.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,13 @@ docs = [
5959
"sphinx-argparse==0.5.2",
6060
"sphinx-issues==5.0.1",
6161
"IPython",
62+
# docs requires running the CLI, which means we need to full inference
63+
# requirements also
64+
"scipy",
65+
"biotite",
66+
"tsinfer>=0.5",
67+
"pyfaidx",
68+
"numba",
6269
]
6370

6471
[build-system]

sc2ts/cli.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
import tqdm
1717
import tskit
1818
import tszip
19-
import tsinfer
2019
import click
2120
import humanize
2221
import pandas as pd
@@ -130,7 +129,7 @@ def setup_logging(verbosity, log_file=None, date=None):
130129
is_flag=True,
131130
flag_value=True,
132131
help=(
133-
"If true, initialise a new dataset. WARNING! This will erase and existing "
132+
"If true, initialise a new dataset. WARNING! This will erase an existing "
134133
"store"
135134
),
136135
)

0 commit comments

Comments
 (0)