|
1 | 1 | # sc2ts |
2 | | -Infer a succinct tree sequence from SARS-COV-2 variation data |
3 | 2 |
|
| 3 | +Sc2ts stands for "SARS-CoV-2 to tree sequence" (pronounced "scoots" optionally) |
| 4 | +and consists of |
4 | 5 |
|
5 | | -If you are interested in helping to develop sc2ts or would like to |
6 | | -work with the inferred ARGS, please get in touch. |
| 6 | +1. A method fo infer Ancestral Recombination Graphs (ARGs) from SARS-CoV-2 |
| 7 | +data at pandemic scale |
| 8 | +2. A lightweight wrapper around [tskit Python APIs](https://tskit.dev/tskit/docs/stable/python-api.html) specialised for the output of sc2ts which enables efficient node metadata |
| 9 | +access. |
| 10 | +3. A lightweight wrapper around [Zarr Python](https://zarr.dev) which enables |
| 11 | +convenient and efficient access to the full Viridian dataset (alignments and metadata) |
| 12 | +in a single file using [VCF Zarr specification](https://doi.org/10.1093/gigascience/giaf049). |
7 | 13 |
|
| 14 | +Please see the [preprint](https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2) |
| 15 | +for details. |
8 | 16 |
|
9 | | -Then, download the ARG in tszip format from |
10 | | -[Zenodo](https://zenodo.org/records/17558489/): |
| 17 | +## Installation |
| 18 | + |
| 19 | +Install sc2ts from PyPI: |
| 20 | + |
| 21 | +``` |
| 22 | +python -m pip install sc2ts |
| 23 | +``` |
| 24 | + |
| 25 | +This installs the minimum requirement to enable the |
| 26 | +[ARG analysis](#ARG-analysis-API) and [Dataset](#Dataset-API)s. |
| 27 | +To run [inference](#inference), you must install some extra |
| 28 | +dependencies using the 'inference' optional extra: |
| 29 | + |
| 30 | +``` |
| 31 | +python -m pip install sc2ts[inference] |
| 32 | +``` |
| 33 | + |
| 34 | +## ARG analysis API |
| 35 | + |
| 36 | +The sc2ts API provides two convenience functions to compute summary |
| 37 | +dataframes for the nodes and mutations in a sc2ts-output ARG. |
| 38 | + |
| 39 | +To see some examples, first download the sc2ts inferred ARG |
| 40 | +from [Zenodo](https://zenodo.org/records/17558489/): |
11 | 41 |
|
12 | 42 | ``` |
13 | 43 | curl -O https://zenodo.org/records/17558489/files/sc2ts_viridian_v1.2.trees.tsz |
14 | 44 | ``` |
15 | 45 |
|
| 46 | +We can then use these like |
16 | 47 |
|
| 48 | +```python |
| 49 | +import sc2ts |
| 50 | +import tszip |
17 | 51 |
|
18 | | -## Installation |
| 52 | +ts = tszip.load("sc2ts_viridian_v1.2.trees.tsz") |
| 53 | + |
| 54 | +df_node = sc2ts.node_data(ts) |
| 55 | +df_mutation = sc2ts.mutation_data(ts) |
| 56 | +``` |
| 57 | + |
| 58 | +See the [live demo](https://tskit.dev/explore/lab/index.html?path=sc2ts.ipynb) |
| 59 | +for a browser based interactive demo of using these dataframes for |
| 60 | +real-time pandemic-scale analysis. |
| 61 | + |
| 62 | +## Dataset API |
19 | 63 |
|
20 | | -** TODO document local install ** |
| 64 | +Sc2ts also provides a convenient API for accessing large-scale |
| 65 | +alignments and metadata stored in |
| 66 | +[VCF Zarr](https://doi.org/10.1093/gigascience/giaf049) format. |
21 | 67 |
|
22 | | -## Inference workflow |
| 68 | +Resources: |
| 69 | + |
| 70 | +- See this [notebook](https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb) |
| 71 | +for an example in which we access the data variant-by-variant and |
| 72 | +which explains the low-level data encoding |
| 73 | +- See the [VCF Zarr publication](https://doi.org/10.1093/gigascience/giaf049) |
| 74 | +for more details on and benchmarks on this dataset. |
| 75 | + |
| 76 | + |
| 77 | +**TODO** Add some references to API documentation |
| 78 | + |
| 79 | +## Inference |
23 | 80 |
|
24 | 81 | ### Command line inference |
25 | 82 |
|
@@ -163,6 +220,7 @@ the existing ``strain`` field is renamed to ``sample_id`` |
163 | 220 | field (extracted from the Viridian metadata) is renamed to ``pango``. |
164 | 221 |
|
165 | 222 | We can then use the analysis APIs on this file: |
| 223 | + |
166 | 224 | ```python |
167 | 225 | import sc2ts |
168 | 226 | import tskit |
|
0 commit comments