Skip to content

Commit 33a6dbe

Browse files
Add basic ARG analysis page
1 parent 25cf173 commit 33a6dbe

File tree

3 files changed

+58
-27
lines changed

3 files changed

+58
-27
lines changed

docs/arg_analysis.md

Lines changed: 45 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,69 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.12
7+
jupytext_version: 1.9.1
8+
kernelspec:
9+
display_name: Python 3
10+
language: python
11+
name: python3
12+
---
13+
14+
```{eval-rst}
15+
.. currentmodule:: sc2ts
16+
```
17+
118
(sec_arg_analysis)=
219
# ARG analysis
320

21+
The sc2ts API provides some convenience functions to compute summary
22+
dataframes for the nodes and mutations in a sc2ts-output ARG.
423

5-
## ARG analysis API
624

7-
The sc2ts API provides two convenience functions to compute summary
8-
dataframes for the nodes and mutations in a sc2ts-output ARG.
25+
## Prerequisites
926

10-
To see some examples, first download the (31MB) sc2ts inferred ARG
11-
from [Zenodo](https://zenodo.org/records/17558489/):
27+
Download a subset of the [sc2ts Viridian ARG](https://zenodo.org/records/17558489/)
28+
with 1000 samples:
1229

1330
```
14-
curl -O https://zenodo.org/records/17558489/files/sc2ts_viridian_v1.2.trees.tsz
31+
curl -O https://raw.githubusercontent.com/tskit-dev/sc2ts/refs/heads/main/docs/sc2ts_viridian_v1.2_subset_1000.trees.tsz
1532
```
1633

17-
We can then use these like
34+
We'll use this small subset as an example throughout.
35+
36+
## Loading
1837

19-
```python
38+
39+
```{code-cell}
2040
import sc2ts
2141
import tszip
2242
23-
ts = tszip.load("sc2ts_viridian_v1.2.trees.tsz")
24-
25-
df_node = sc2ts.node_data(ts)
26-
df_mutation = sc2ts.mutation_data(ts)
43+
ts = tszip.load("sc2ts_viridian_v1.2_subset_1000.trees.tsz")
2744
```
2845

29-
See the [live demo](https://tskit.dev/explore/lab/index.html?path=sc2ts.ipynb)
30-
for a browser based interactive demo of using these dataframes for
31-
real-time pandemic-scale analysis.
32-
33-
## Dataset API
46+
You can then use the full [tskit](https://tskit.dev/tskit/docs/)
47+
Python API on this ARG.
3448

35-
Sc2ts also provides a convenient API for accessing large-scale
36-
alignments and metadata stored in
37-
[VCF Zarr](https://doi.org/10.1093/gigascience/giaf049) format.
49+
## Node data
3850

39-
Resources:
51+
The {func}`node_data` function returns a Pandas dataframe of data for each
52+
node in the ARG.
4053

41-
- See this [notebook](https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb)
42-
for an example in which we access the data variant-by-variant and
43-
which explains the low-level data encoding
44-
- See the [VCF Zarr publication](https://doi.org/10.1093/gigascience/giaf049)
45-
for more details on and benchmarks on this dataset.
54+
```{code-cell}
55+
dfn = sc2ts.node_data(ts)
56+
dfn
57+
```
4658

4759

48-
**TODO** Add some references to API documentation
60+
## Mutation data
4961

62+
The {func}`mutation_data` function returns a Pandas dataframe of data for each
63+
mutation_in the ARG.
5064

65+
```{code-cell}
66+
dfm = sc2ts.mutation_data(ts)
67+
dfm
68+
```
5169

docs/make_sc2ts_arg_subset.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import tszip
2+
import numpy as np
3+
4+
ts = tszip.load("sc2ts_viridian_v1.2.trees.tsz")
5+
6+
k = 1000
7+
idx = np.round(np.linspace(0, ts.num_samples - 1, k)).astype(int)
8+
9+
subset = ts.samples()[idx]
10+
print(subset)
11+
tss = ts.simplify(subset, filter_sites=False)
12+
13+
tszip.compress(tss, f"sc2ts_viridian_v1.2_subset_{k}.trees.tsz")
159 KB
Binary file not shown.

0 commit comments

Comments
 (0)