Skip to content

Commit 9ac0b6a

Browse files
hyanwonggregorgorjancjeromekelleher
authored
Update glossary.md (#3340)
* Update glossary.md Reference ARGs in the glossary * Tidy and add more to the glossary * Update docs/glossary.md Co-authored-by: Gregor Gorjanc <[email protected]> * Update docs/glossary.md Co-authored-by: Jerome Kelleher <[email protected]> * Remove reference to node heights --------- Co-authored-by: Gregor Gorjanc <[email protected]> Co-authored-by: Jerome Kelleher <[email protected]>
1 parent c81a38d commit 9ac0b6a

File tree

1 file changed

+37
-14
lines changed

1 file changed

+37
-14
lines changed

docs/glossary.md

Lines changed: 37 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -30,26 +30,39 @@ Here are some definitions of some key ideas encountered in this documentation.
3030
tree
3131
: A "gene tree", i.e., the genealogical tree describing how a collection of
3232
genomes (usually at the tips of the tree) are related to each other at some
33-
chromosomal location. See {ref}`sec_nodes_or_individuals` for discussion
34-
of what a "genome" is.
33+
chromosomal {ref}`position <sec_data_model_definitions_position>` or location.
34+
As the trees may vary depending on this location, they are also known as "local
35+
trees". See {ref}`sec_nodes_or_individuals` for discussion of what a "genome" is.
3536

3637
(sec_data_model_definitions_tree_sequence)=
3738

3839
tree sequence
39-
: A "succinct tree sequence" (or tree sequence, for brevity) is an efficient
40-
encoding of a sequence of correlated trees, such as one encounters looking
41-
at the gene trees along a genome. A tree sequence efficiently captures the
42-
structure shared by adjacent trees, (essentially) storing only what differs
43-
between them.
40+
: A "succinct tree sequence" (or tree sequence, for brevity) is an object
41+
that stores the genetic ancestry and mutational history of a set of
42+
aligned DNA sequences or genomes. The name reflects the idea that a common
43+
way to treat genetic ancestry is as a sequence of correlated
44+
{ref}`trees <sec_data_model_definitions_tree>` at different chromosomal
45+
{ref}`positions <sec_data_model_definitions_position>`.
46+
Branches that are shared between these trees are efficiently stored as a
47+
single {ref}`edge <sec_data_model_definitions_edge>`, and adjacent trees
48+
may differ by only a few such edges. These edges connect
49+
{ref}`nodes <sec_data_model_definitions_node>` (genomes) in
50+
the tree sequence, forming a
51+
network or graph. Graphs of this sort are sometimes called ancestral
52+
recombination graphs (ARGs), hence tree sequences provide a
53+
flexible way to encode multiple types of ARG.
4454

4555
(sec_data_model_definitions_node)=
4656

4757
node
48-
: Each branching point in each tree is associated with a particular genome
58+
: Any point in a tree can be associated with a particular genome
4959
in a particular ancestor, called a "node". Since each node represents a
50-
specific genome it has a unique `time`, thought of as its birth time,
51-
which determines the height of any branching points it is associated with.
52-
See {ref}`sec_nodes_or_individuals` for discussion of what a "node" is.
60+
specific genome it has a unique `time`, thought of as its birth time. Nodes
61+
may or may not correspond to branching points, either in a local
62+
{ref}`tree <sec_data_model_definitions_tree>` or in the whole graph.
63+
However a branching point must always be associated with a node.
64+
See {ref}`sec_nodes_or_individuals` for discussion of what a "node"
65+
represents.
5366

5467
(sec_data_model_definitions_individual)=
5568

@@ -66,7 +79,7 @@ individual
6679
sample
6780
: The focal nodes of a tree sequence, usually thought of as those from which
6881
we have obtained data. The specification of these affects various
69-
methods: (1) {meth}`TreeSequence.variants` and
82+
methods: {meth}`TreeSequence.variants` and
7083
{meth}`TreeSequence.haplotypes` will output the genotypes of the samples,
7184
and {attr}`Tree.roots` only return roots ancestral to at least one
7285
sample.
@@ -81,13 +94,15 @@ edge
8194
: The topology of a tree sequence is defined by a set of **edges**. Each
8295
edge is a tuple `(left, right, parent, child)`, which records a
8396
parent-child relationship among a pair of nodes on the
84-
on the half-open interval of chromosome `[left, right)`.
97+
on the half-open interval `[left, right)` along the genome. The difference
98+
between `left` and `right` is known as the "span" of the edge.
8599

86100
(sec_data_model_definitions_site)=
87101

88102
site
89103
: Tree sequences can define the mutational state of nodes as well as their
90-
topological relationships. A **site** is thought of as some position along
104+
topological relationships. A **site** is thought of as some
105+
{ref}`position <sec_data_model_definitions_position>` along
91106
the genome at which variation occurs. Each site is associated with
92107
a unique position and ancestral state.
93108

@@ -114,6 +129,14 @@ migration
114129
population
115130
: A grouping of nodes, e.g., by sampling location.
116131

132+
(sec_data_model_definitions_position)=
133+
134+
position
135+
: A location along the genome, from 0 to the
136+
{ref}`sequence length<sec_data_model_definitions_sequence_length>`. In `tskit`
137+
positions are stored as floating-point numbers, although it is common to
138+
restrict positions to occur at discrete integer locations.
139+
117140
(sec_data_model_definitions_provenance)=
118141

119142
provenance

0 commit comments

Comments
 (0)