|
1 | | -.. _sec_interchange: |
| 1 | +.. _sec_data_model: |
2 | 2 |
|
3 | | -######################### |
4 | | -Tree sequence interchange |
5 | | -######################### |
| 3 | +########## |
| 4 | +Data model |
| 5 | +########## |
6 | 6 |
|
7 | 7 | The correlated genealogical trees that describe the shared ancestry of a set of |
8 | | -samples are stored concisely in ``msprime`` as a collection of |
| 8 | +samples are stored concisely in ``tskit`` as a collection of |
9 | 9 | easy-to-understand tables. These are output by coalescent simulation in |
10 | 10 | ``msprime`` or can be read in from another source. This page documents |
11 | 11 | the structure of the tables, and the different methods of interchanging |
12 | | -genealogical data to and from the msprime API. We begin by defining |
| 12 | +genealogical data to and from the tskit API. We begin by defining |
13 | 13 | the basic concepts that we need and the structure of the tables in the |
14 | 14 | `Data model`_ section. We then describe the tabular text formats that can |
15 | 15 | be used as simple interchange mechanism for small amounts of data in the |
16 | 16 | `Text file formats`_ section. The `Binary interchange`_ section then describes |
17 | 17 | the efficient Python API for table interchange using numpy arrays. Finally, |
18 | | -we describe the binary format used by msprime to efficiently |
| 18 | +we describe the binary format used by tskit to efficiently |
19 | 19 | store tree sequences on disk in the `Tree sequence file format`_ section. |
20 | 20 |
|
21 | 21 |
|
22 | | -.. _sec_data_model: |
| 22 | +.. _sec_data_model_definitions: |
23 | 23 |
|
24 | | -********** |
25 | | -Data model |
26 | | -********** |
| 24 | +*********** |
| 25 | +Definitions |
| 26 | +*********** |
27 | 27 |
|
28 | 28 | To begin, here are definitions of some key ideas encountered later. |
29 | 29 |
|
@@ -156,7 +156,7 @@ term "genome" at times, for concreteness. |
156 | 156 | Several properties naturally associated with individuals are in fact assigned |
157 | 157 | to nodes in what follows: birth time and population. This is for two reasons: |
158 | 158 | First, since coalescent simulations naturally lack a notion of polyploidy, earlier |
159 | | -versions of ``msprime`` lacked the notion of an individual. Second, ancestral |
| 159 | +versions of ``tskit`` lacked the notion of an individual. Second, ancestral |
160 | 160 | nodes are not naturally grouped together into individuals -- we know they must have |
161 | 161 | existed, but have no way of inferring this grouping, so in fact many nodes in |
162 | 162 | an empirically-derived tree sequence will not be associated with individuals, |
@@ -405,7 +405,7 @@ helpful for inferring demographic history to record this history. |
405 | 405 | Migrations are performed by individual ancestors, but most likely not by an |
406 | 406 | individual whose genome is tracked as a ``node`` (as in a discrete-deme model they are |
407 | 407 | unlikely to be both a migrant and a most recent common ancestor). So, |
408 | | -``msprime`` records when a segment of ancestry has moved between |
| 408 | +``tskit`` records when a segment of ancestry has moved between |
409 | 409 | populations. This table is not required, even if different nodes come from |
410 | 410 | different populations. |
411 | 411 |
|
@@ -491,7 +491,7 @@ the library itself can use. All other information is considered to be |
491 | 491 | tables. |
492 | 492 |
|
493 | 493 | Arbitrary binary data can be stored in ``metadata`` columns, and the |
494 | | -``msprime`` library makes no attempt to interpret this information. How the |
| 494 | +``tskit`` library makes no attempt to interpret this information. How the |
495 | 495 | information held in this field is encoded is entirely the choice of client code. |
496 | 496 |
|
497 | 497 | To ensure that metadata can be safely interchanged using the :ref:`sec_text_file_format`, |
@@ -1046,7 +1046,7 @@ length. To encode such columns in the tables API, we store **two** columns: |
1046 | 1046 | one contains the flattened array of data and another stores the **offsets** |
1047 | 1047 | of each row into this flattened array. Consider an example:: |
1048 | 1048 |
|
1049 | | - >>> s = msprime.SiteTable() |
| 1049 | + >>> s = tskit.SiteTable() |
1050 | 1050 | >>> s.add_row(0, "A") |
1051 | 1051 | >>> s.add_row(0, "") |
1052 | 1052 | >>> s.add_row(0, "TTT") |
@@ -1231,7 +1231,7 @@ Legacy Versions |
1231 | 1231 | =============== |
1232 | 1232 |
|
1233 | 1233 | Tree sequence files written by older versions of tskit are not readable by |
1234 | | -newer versions of msprime. For major releases of tskit, ``tskit upgrade`` |
| 1234 | +newer versions of tskit. For major releases of tskit, ``tskit upgrade`` |
1235 | 1235 | will convert older tree sequence files to the latest version. |
1236 | 1236 |
|
1237 | 1237 | File formats from version 11 onwards are based on |
|
0 commit comments