|
3 | 3 | Sc2ts stands for "SARS-CoV-2 to tree sequence" (pronounced "scoots" optionally) |
4 | 4 | and consists of |
5 | 5 |
|
6 | | -1. A method fo infer Ancestral Recombination Graphs (ARGs) from SARS-CoV-2 |
| 6 | +1. A method to infer Ancestral Recombination Graphs (ARGs) from SARS-CoV-2 |
7 | 7 | data at pandemic scale |
8 | 8 | 2. A lightweight wrapper around [tskit Python APIs](https://tskit.dev/tskit/docs/stable/python-api.html) specialised for the output of sc2ts which enables efficient node metadata |
9 | 9 | access. |
10 | 10 | 3. A lightweight wrapper around [Zarr Python](https://zarr.dev) which enables |
11 | 11 | convenient and efficient access to the full Viridian dataset (alignments and metadata) |
12 | | -in a single file using [VCF Zarr specification](https://doi.org/10.1093/gigascience/giaf049). |
| 12 | +in a single file using the [VCF Zarr specification](https://doi.org/10.1093/gigascience/giaf049). |
13 | 13 |
|
14 | 14 | Please see the [preprint](https://www.biorxiv.org/content/10.1101/2023.06.08.544212v2) |
15 | 15 | for details. |
@@ -130,8 +130,8 @@ python -m sc2ts --help |
130 | 130 | Primary inference is performed using the ``infer`` subcommand of the CLI, |
131 | 131 | and all parameters are specified using a toml file. |
132 | 132 |
|
133 | | -Then inference under the [example config](example_config.toml) |
134 | | -for little while to see how things work: |
| 133 | +The [example config file](example_config.toml) can be used to perform |
| 134 | +inference over a short period, to demonstrate how sc2ts works: |
135 | 135 |
|
136 | 136 | ``` |
137 | 137 | python3 -m sc2ts infer example_config.toml --stop=2020-02-02 |
@@ -161,9 +161,9 @@ example_inference |
161 | 161 | └── ex1.matches.db |
162 | 162 | ``` |
163 | 163 |
|
164 | | -Here we've run inference for all dates in January 2020 for which we have data |
165 | | -and Feb 01. The results of inference for each day is stored in the |
166 | | -``example_inference/ex1`` directory as a tskit file representing the ARG |
| 164 | +Here we've run inference for all dates in January 2020 for which we have data, plus the 1st Feb. |
| 165 | +The results of inference for each day are stored in the |
| 166 | +``example_inference/ex1`` directory as tskit files representing the ARG |
167 | 167 | inferred up to that day. There is a lot of redundancy in keeping all these |
168 | 168 | daily files lying around, but it is useful to be able to go back to the |
169 | 169 | state of the ARG at a particular date and they don't take up much space. |
@@ -200,7 +200,7 @@ into the final ARG. |
200 | 200 |
|
201 | 201 | ### Generating final analysis file |
202 | 202 |
|
203 | | -To generate the final analysis ready file (used as input to the analysis |
| 203 | +To generate the final analysis-ready file (used as input to the analysis |
204 | 204 | APIs above) we need to run ``minimise-metadata``. This removes all but |
205 | 205 | the most necessary metadata from the ARG, and recodes node metadata |
206 | 206 | using the [struct codec](https://tskit.dev/tskit/docs/stable/metadata.html#structured-array-metadata) |
@@ -266,7 +266,7 @@ and rerun tests as above. |
266 | 266 |
|
267 | 267 | ### Debug utilities |
268 | 268 |
|
269 | | -The tree sequences files output during primary inference have a lot |
| 269 | +The tree sequence files output during primary inference have a lot |
270 | 270 | of debugging metadata, and there are some developer tools for inspecting |
271 | 271 | this in the ``sc2ts.debug`` package. In particular, the ``ArgInfo`` |
272 | 272 | class has a lot of useful utilities designed to be used in a Jupyter |
|
0 commit comments