|
| 1 | +--- |
| 2 | +jupytext: |
| 3 | + text_representation: |
| 4 | + extension: .md |
| 5 | + format_name: myst |
| 6 | + format_version: 0.12 |
| 7 | + jupytext_version: 1.9.1 |
| 8 | +kernelspec: |
| 9 | + display_name: Python 3 |
| 10 | + language: python |
| 11 | + name: python3 |
| 12 | +--- |
| 13 | + |
| 14 | +```{eval-rst} |
| 15 | +.. currentmodule:: sc2ts |
| 16 | +``` |
| 17 | + |
1 | 18 | (sec_arg_analysis)= |
2 | 19 | # ARG analysis |
3 | 20 |
|
| 21 | +The sc2ts API provides some convenience functions to compute summary |
| 22 | +dataframes for the nodes and mutations in a sc2ts-output ARG. |
4 | 23 |
|
5 | | -## ARG analysis API |
6 | 24 |
|
7 | | -The sc2ts API provides two convenience functions to compute summary |
8 | | -dataframes for the nodes and mutations in a sc2ts-output ARG. |
| 25 | +## Prerequisites |
9 | 26 |
|
10 | | -To see some examples, first download the (31MB) sc2ts inferred ARG |
11 | | -from [Zenodo](https://zenodo.org/records/17558489/): |
| 27 | +Download a subset of the [sc2ts Viridian ARG](https://zenodo.org/records/17558489/) |
| 28 | +with 1000 samples: |
12 | 29 |
|
13 | 30 | ``` |
14 | | -curl -O https://zenodo.org/records/17558489/files/sc2ts_viridian_v1.2.trees.tsz |
| 31 | +curl -O https://raw.githubusercontent.com/tskit-dev/sc2ts/refs/heads/main/docs/sc2ts_viridian_v1.2_subset_1000.trees.tsz |
15 | 32 | ``` |
16 | 33 |
|
17 | | -We can then use these like |
| 34 | +We'll use this small subset as an example throughout. |
| 35 | + |
| 36 | +## Loading |
18 | 37 |
|
19 | | -```python |
| 38 | + |
| 39 | +```{code-cell} |
20 | 40 | import sc2ts |
21 | 41 | import tszip |
22 | 42 |
|
23 | | -ts = tszip.load("sc2ts_viridian_v1.2.trees.tsz") |
24 | | - |
25 | | -df_node = sc2ts.node_data(ts) |
26 | | -df_mutation = sc2ts.mutation_data(ts) |
| 43 | +ts = tszip.load("sc2ts_viridian_v1.2_subset_1000.trees.tsz") |
27 | 44 | ``` |
28 | 45 |
|
29 | | -See the [live demo](https://tskit.dev/explore/lab/index.html?path=sc2ts.ipynb) |
30 | | -for a browser based interactive demo of using these dataframes for |
31 | | -real-time pandemic-scale analysis. |
32 | | - |
33 | | -## Dataset API |
| 46 | +You can then use the full [tskit](https://tskit.dev/tskit/docs/) |
| 47 | +Python API on this ARG. |
34 | 48 |
|
35 | | -Sc2ts also provides a convenient API for accessing large-scale |
36 | | -alignments and metadata stored in |
37 | | -[VCF Zarr](https://doi.org/10.1093/gigascience/giaf049) format. |
| 49 | +## Node data |
38 | 50 |
|
39 | | -Resources: |
| 51 | +The {func}`node_data` function returns a Pandas dataframe of data for each |
| 52 | +node in the ARG. |
40 | 53 |
|
41 | | -- See this [notebook](https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb) |
42 | | -for an example in which we access the data variant-by-variant and |
43 | | -which explains the low-level data encoding |
44 | | -- See the [VCF Zarr publication](https://doi.org/10.1093/gigascience/giaf049) |
45 | | -for more details on and benchmarks on this dataset. |
| 54 | +```{code-cell} |
| 55 | +dfn = sc2ts.node_data(ts) |
| 56 | +dfn |
| 57 | +``` |
46 | 58 |
|
47 | 59 |
|
48 | | -**TODO** Add some references to API documentation |
| 60 | +## Mutation data |
49 | 61 |
|
| 62 | +The {func}`mutation_data` function returns a Pandas dataframe of data for each |
| 63 | +mutation_in the ARG. |
50 | 64 |
|
| 65 | +```{code-cell} |
| 66 | +dfm = sc2ts.mutation_data(ts) |
| 67 | +dfm |
| 68 | +``` |
51 | 69 |
|
0 commit comments