@@ -30,26 +30,39 @@ Here are some definitions of some key ideas encountered in this documentation.
3030tree
3131: A "gene tree", i.e., the genealogical tree describing how a collection of
3232 genomes (usually at the tips of the tree) are related to each other at some
33- chromosomal location. See {ref}` sec_nodes_or_individuals ` for discussion
34- of what a "genome" is.
33+ chromosomal {ref}` position <sec_data_model_definitions_position> ` or location.
34+ As the trees may vary depending on this location, they are also known as "local
35+ trees". See {ref}` sec_nodes_or_individuals ` for discussion of what a "genome" is.
3536
3637(sec_data_model_definitions_tree_sequence)=
3738
3839tree sequence
39- : A "succinct tree sequence" (or tree sequence, for brevity) is an efficient
40- encoding of a sequence of correlated trees, such as one encounters looking
41- at the gene trees along a genome. A tree sequence efficiently captures the
42- structure shared by adjacent trees, (essentially) storing only what differs
43- between them.
40+ : A "succinct tree sequence" (or tree sequence, for brevity) is an object
41+ that stores the genetic ancestry and mutational history of a set of
42+ aligned DNA sequences or genomes. The name reflects the idea that a common
43+ way to treat genetic ancestry is as a sequence of correlated
44+ {ref}` trees <sec_data_model_definitions_tree> ` at different chromosomal
45+ {ref}` positions <sec_data_model_definitions_position> ` .
46+ Branches that are shared between these trees are efficiently stored as a
47+ single {ref}` edge <sec_data_model_definitions_edge> ` , and adjacent trees
48+ may differ by only a few such edges. These edges connect
49+ {ref}` nodes <sec_data_model_definitions_node> ` (genomes) in
50+ the tree sequence, forming a
51+ network or graph. Graphs of this sort are sometimes called ancestral
52+ recombination graphs (ARGs), hence tree sequences provide a
53+ flexible way to encode multiple types of ARG.
4454
4555(sec_data_model_definitions_node)=
4656
4757node
48- : Each branching point in each tree is associated with a particular genome
58+ : Any point in a tree can be associated with a particular genome
4959 in a particular ancestor, called a "node". Since each node represents a
50- specific genome it has a unique ` time ` , thought of as its birth time,
51- which determines the height of any branching points it is associated with.
52- See {ref}` sec_nodes_or_individuals ` for discussion of what a "node" is.
60+ specific genome it has a unique ` time ` , thought of as its birth time. Nodes
61+ may or may not correspond to branching points, either in a local
62+ {ref}` tree <sec_data_model_definitions_tree> ` or in the whole graph.
63+ However a branching point must always be associated with a node.
64+ See {ref}` sec_nodes_or_individuals ` for discussion of what a "node"
65+ represents.
5366
5467(sec_data_model_definitions_individual)=
5568
@@ -66,7 +79,7 @@ individual
6679sample
6780: The focal nodes of a tree sequence, usually thought of as those from which
6881 we have obtained data. The specification of these affects various
69- methods: (1) {meth}` TreeSequence.variants ` and
82+ methods: {meth}` TreeSequence.variants ` and
7083 {meth}` TreeSequence.haplotypes ` will output the genotypes of the samples,
7184 and {attr}` Tree.roots ` only return roots ancestral to at least one
7285 sample.
8194: The topology of a tree sequence is defined by a set of ** edges** . Each
8295 edge is a tuple ` (left, right, parent, child) ` , which records a
8396 parent-child relationship among a pair of nodes on the
84- on the half-open interval of chromosome ` [left, right) ` .
97+ on the half-open interval ` [left, right) ` along the genome. The difference
98+ between ` left ` and ` right ` is known as the "span" of the edge.
8599
86100(sec_data_model_definitions_site)=
87101
88102site
89103: Tree sequences can define the mutational state of nodes as well as their
90- topological relationships. A ** site** is thought of as some position along
104+ topological relationships. A ** site** is thought of as some
105+ {ref}` position <sec_data_model_definitions_position> ` along
91106 the genome at which variation occurs. Each site is associated with
92107 a unique position and ancestral state.
93108
@@ -114,6 +129,14 @@ migration
114129population
115130: A grouping of nodes, e.g., by sampling location.
116131
132+ (sec_data_model_definitions_position)=
133+
134+ position
135+ : A location along the genome, from 0 to the
136+ {ref}` sequence length<sec_data_model_definitions_sequence_length> ` . In ` tskit `
137+ positions are stored as floating-point numbers, although it is common to
138+ restrict positions to occur at discrete integer locations.
139+
117140(sec_data_model_definitions_provenance)=
118141
119142provenance
0 commit comments