@@ -11,105 +11,8 @@ access.
1111convenient and efficient access to the full Viridian dataset (alignments and metadata)
1212in a single file using the [ VCF Zarr specification] ( https://doi.org/10.1093/gigascience/giaf049 ) .
1313
14- Please see the [ preprint] ( https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2 )
15- for details.
16-
17- ## Installation
18-
19- Install sc2ts from PyPI:
20-
21- ```
22- python -m pip install sc2ts
23- ```
24-
25- This installs the minimum requirement to enable the
26- [ ARG analysis] ( #ARG-analysis-API ) and [ Dataset] ( #Dataset-API ) s.
27- To run [ inference] ( #inference ) , you must install some extra
28- dependencies using the 'inference' optional extra:
29-
30- ```
31- python -m pip install sc2ts[inference]
32- ```
33-
34- ## ARG analysis API
35-
36- The sc2ts API provides two convenience functions to compute summary
37- dataframes for the nodes and mutations in a sc2ts-output ARG.
38-
39- To see some examples, first download the (31MB) sc2ts inferred ARG
40- from [ Zenodo] ( https://zenodo.org/records/17558489/ ) :
41-
42- ```
43- curl -O https://zenodo.org/records/17558489/files/sc2ts_viridian_v1.2.trees.tsz
44- ```
45-
46- We can then use these like
47-
48- ``` python
49- import sc2ts
50- import tszip
51-
52- ts = tszip.load(" sc2ts_viridian_v1.2.trees.tsz" )
53-
54- df_node = sc2ts.node_data(ts)
55- df_mutation = sc2ts.mutation_data(ts)
56- ```
57-
58- See the [ live demo] ( https://tskit.dev/explore/lab/index.html?path=sc2ts.ipynb )
59- for a browser based interactive demo of using these dataframes for
60- real-time pandemic-scale analysis.
61-
62- ## Dataset API
63-
64- Sc2ts also provides a convenient API for accessing large-scale
65- alignments and metadata stored in
66- [ VCF Zarr] ( https://doi.org/10.1093/gigascience/giaf049 ) format.
67-
68- Resources:
69-
70- - See this [ notebook] ( https://github.com/jeromekelleher/sc2ts-paper/blob/main/notebooks/example_data_processing.ipynb )
71- for an example in which we access the data variant-by-variant and
72- which explains the low-level data encoding
73- - See the [ VCF Zarr publication] ( https://doi.org/10.1093/gigascience/giaf049 )
74- for more details on and benchmarks on this dataset.
75-
76-
77- ** TODO** Add some references to API documentation
78-
79-
80-
81- ## Development
82-
83- To run the unit tests, use
84-
85- ```
86- python3 -m pytest
87- ```
88-
89- You may need to regenerate some cached test fixtures occasionaly (particularly
90- if getting cryptic errors when running the test suite). To do this, run
91-
92- ```
93- rm -fR tests/data/cache/
94- ```
95-
96- and rerun tests as above.
97-
98- ### Debug utilities
99-
100- The tree sequence files output during primary inference have a lot
101- of debugging metadata, and there are some developer tools for inspecting
102- this in the `` sc2ts.debug `` package. In particular, the `` ArgInfo ``
103- class has a lot of useful utilities designed to be used in a Jupyter
104- notebook. Note that `` matplotlib `` is required for these. Use it like:
105-
106- ``` python
107- import sc2ts.debug as sd
108- import tskit
109-
110- ts = tskit.load(" path_to_daily_inference.ts" )
111- ai = sd.ArgInfo(ts)
112- ai # view summary in notebook
113- ```
114-
14+ Please see the online [ documentation] ( https://tskit.dev/sc2ts/docs ) for details
15+ on the software
16+ and the [ preprint] ( https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2 )
17+ for information on the method and the inferred ARG.
11518
0 commit comments