Skip to content

Commit eaf8c5f

Browse files
Merge pull request #566 from jeromekelleher/more-docs
Update to readme
2 parents a71d1e3 + 0e19734 commit eaf8c5f

File tree

1 file changed

+66
-8
lines changed

1 file changed

+66
-8
lines changed

README.md

Lines changed: 66 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,82 @@
11
# sc2ts
2-
Infer a succinct tree sequence from SARS-COV-2 variation data
32

3+
Sc2ts stands for "SARS-CoV-2 to tree sequence" (pronounced "scoots" optionally)
4+
and consists of
45

5-
If you are interested in helping to develop sc2ts or would like to
6-
work with the inferred ARGS, please get in touch.
6+
1. A method fo infer Ancestral Recombination Graphs (ARGs) from SARS-CoV-2
7+
data at pandemic scale
8+
2. A lightweight wrapper around [tskit Python APIs](https://tskit.dev/tskit/docs/stable/python-api.html) specialised for the output of sc2ts which enables efficient node metadata
9+
access.
10+
3. A lightweight wrapper around [Zarr Python](https://zarr.dev) which enables
11+
convenient and efficient access to the full Viridian dataset (alignments and metadata)
12+
in a single file using [VCF Zarr specification](https://doi.org/10.1093/gigascience/giaf049).
713

14+
Please see the [preprint](https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2)
15+
for details.
816

9-
Then, download the ARG in tszip format from
10-
[Zenodo](https://zenodo.org/records/17558489/):
17+
## Installation
18+
19+
Install sc2ts from PyPI:
20+
21+
```
22+
python -m pip install sc2ts
23+
```
24+
25+
This installs the minimum requirement to enable the
26+
[ARG analysis](#ARG-analysis-API) and [Dataset](#Dataset-API)s.
27+
To run [inference](#inference), you must install some extra
28+
dependencies using the 'inference' optional extra:
29+
30+
```
31+
python -m pip install sc2ts[inference]
32+
```
33+
34+
## ARG analysis API
35+
36+
The sc2ts API provides two convenience functions to compute summary
37+
dataframes for the nodes and mutations in a sc2ts-output ARG.
38+
39+
To see some examples, first download the sc2ts inferred ARG
40+
from [Zenodo](https://zenodo.org/records/17558489/):
1141

1242
```
1343
curl -O https://zenodo.org/records/17558489/files/sc2ts_viridian_v1.2.trees.tsz
1444
```
1545

46+
We can then use these like
1647

48+
```python
49+
import sc2ts
50+
import tszip
1751

18-
## Installation
52+
ts = tszip.load("sc2ts_viridian_v1.2.trees.tsz")
53+
54+
df_node = sc2ts.node_data(ts)
55+
df_mutation = sc2ts.mutation_data(ts)
56+
```
57+
58+
See the [live demo](https://tskit.dev/explore/lab/index.html?path=sc2ts.ipynb)
59+
for a browser based interactive demo of using these dataframes for
60+
real-time pandemic-scale analysis.
61+
62+
## Dataset API
1963

20-
** TODO document local install **
64+
Sc2ts also provides a convenient API for accessing large-scale
65+
alignments and metadata stored in
66+
[VCF Zarr](https://doi.org/10.1093/gigascience/giaf049) format.
2167

22-
## Inference workflow
68+
Resources:
69+
70+
- See this [notebook](https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb)
71+
for an example in which we access the data variant-by-variant and
72+
which explains the low-level data encoding
73+
- See the [VCF Zarr publication](https://doi.org/10.1093/gigascience/giaf049)
74+
for more details on and benchmarks on this dataset.
75+
76+
77+
**TODO** Add some references to API documentation
78+
79+
## Inference
2380

2481
### Command line inference
2582

@@ -163,6 +220,7 @@ the existing ``strain`` field is renamed to ``sample_id``
163220
field (extracted from the Viridian metadata) is renamed to ``pango``.
164221

165222
We can then use the analysis APIs on this file:
223+
166224
```python
167225
import sc2ts
168226
import tskit

0 commit comments

Comments
 (0)