1+ .. currentmodule :: tskit
12.. _sec_data_model :
23
34##########
@@ -21,6 +22,8 @@ store tree sequences on disk in the `Tree sequence file format`_ section.
2122
2223.. _sec_data_model_definitions :
2324
25+
26+
2427***********
2528Definitions
2629***********
5053individual
5154 In certain situations we are interested in how nodes (representing
5255 individual homologous genomes) are grouped together into individuals
53- (e.g., two nodes per diploid individual). For example, when we are working
56+ (e.g. two nodes per diploid individual). For example, when we are working
5457 with polyploid samples it is useful to associate metadata with a specific
5558 individual rather than duplicate this information on the constituent nodes.
5659 See :ref: `sec_nodes_or_individuals ` for more discussion on this point.
5760
5861sample
5962 The focal nodes of a tree sequence, usually thought of as those that we
6063 have obtained data from. The specification of these affects various
61- methods: (1) :meth: `. TreeSequence.variants ` and
62- :meth: `. TreeSequence.haplotypes ` will output the genotypes of the samples,
63- and :attr: `. Tree.roots ` only return roots ancestral to at least one
64+ methods: (1) :meth: `TreeSequence.variants ` and
65+ :meth: `TreeSequence.haplotypes ` will output the genotypes of the samples,
66+ and :attr: `Tree.roots ` only return roots ancestral to at least one
6467 sample. (See the :ref: `node table definitions <sec_node_table_definition >`
6568 for information on how the sample
6669 status a node is encoded in the ``flags `` column.)
@@ -214,7 +217,7 @@ is composed of 32 bitwise boolean values. Currently, the only flag defined
214217is ``IS_SAMPLE = 1 ``, which defines the *sample * status of nodes. Marking
215218a particular node as a "sample" means, for example, that the mutational state
216219of the node will be included in the genotypes produced by
217- :meth: `. TreeSequence.variants `.
220+ :meth: `TreeSequence.variants `.
218221
219222Bits 0-15 (inclusive) of the ``flags `` column are reserved for internal use by
220223``tskit `` and should not be used by applications for anything other
@@ -511,14 +514,14 @@ Valid tree sequence requirements
511514================================
512515
513516Arbitrary data can be stored in tables using the classes in the
514- :ref: `sec_tables_api `. However, only a :class: `. TableCollection `
517+ :ref: `sec_tables_api `. However, only a :class: `TableCollection `
515518that fulfils a set of requirements represents
516- a valid :class: `. TreeSequence ` object which can be obtained
517- using the :meth: `. TableCollection.tree_sequence ` method. In this
519+ a valid :class: `TreeSequence ` object which can be obtained
520+ using the :meth: `TableCollection.tree_sequence ` method. In this
518521section we list these requirements, and explain their rationale.
519522Violations of most of these requirements are detected when the
520- user attempts to load a tree sequence via :func: `.load ` or
521- :meth: `. TableCollection.tree_sequence `, raising an informative
523+ user attempts to load a tree sequence via :func: `tskit .load ` or
524+ :meth: `TableCollection.tree_sequence `, raising an informative
522525error message. Some more complex requirements may not be detectable at load-time,
523526and errors may not occur until certain operations are attempted.
524527These are documented below.
@@ -536,7 +539,7 @@ respect to any other tables. Therefore, there are no requirements on
536539individuals.
537540
538541There are no requirements regarding the ordering of individuals.
539- Sorting a set of tables using :meth: `. TableCollection.sort ` has
542+ Sorting a set of tables using :meth: `TableCollection.sort ` has
540543no effect on the individuals.
541544
542545.. _sec_node_requirements :
@@ -558,7 +561,7 @@ There are no requirements regarding the ordering of nodes with respect to time.
558561For simplicity and algorithmic efficiency, all nodes referring to the same
559562(non-null) individual must be contiguous.
560563
561- Sorting a set of tables using :meth: `. TableCollection.sort `
564+ Sorting a set of tables using :meth: `TableCollection.sort `
562565has no effect on nodes.
563566
564567.. _sec_edge_requirements :
@@ -587,7 +590,7 @@ where a node ``a`` is a child of both ``b`` and ``c``), and ensures that
587590at each point on the sequence we have a well-formed forest of trees.
588591Because this is a more complex semantic requirement, it is **not ** detected
589592at load time. This error is detected during tree traversal, via, e.g.,
590- the :meth: `. TreeSequence.trees ` iterator.
593+ the :meth: `TreeSequence.trees ` iterator.
591594
592595In the interest of algorithmic efficiency, edges must have the following
593596sortedness properties:
@@ -598,7 +601,7 @@ sortedness properties:
598601 first by ``child `` ID and then by ``left `` coordinate.
599602
600603Violations of these requirements are detected at load time.
601- The :meth: `. TableCollection.sort ` method will ensure that these sortedness
604+ The :meth: `TableCollection.sort ` method will ensure that these sortedness
602605properties are fulfilled.
603606
604607.. _sec_site_requirements :
@@ -617,7 +620,7 @@ For simplicity and algorithmic efficiency, sites must also:
617620- Be sorted in increasing order of ``position ``.
618621
619622Violations of these requirements are detected at load time.
620- The :meth: `. TableCollection.sort ` method ensures that sites are sorted
623+ The :meth: `TableCollection.sort ` method ensures that sites are sorted
621624according to these criteria.
622625
623626.. _sec_mutation_requirements :
@@ -646,7 +649,7 @@ For simplicity and algorithmic efficiency, mutations must also:
646649 ``parent `` with ID :math: `y`, then we must have :math: `y < x`).
647650
648651Violations of these sorting requirements are detected at load time.
649- The :meth: `. TableCollection.sort ` method ensures that mutations are sorted
652+ The :meth: `TableCollection.sort ` method ensures that mutations are sorted
650653according site ID, but does not at present enforce that mutations occur
651654after their parent mutations.
652655
@@ -655,7 +658,7 @@ change of state. For example, if we have a site with ancestral state
655658of "A" and a single mutation with derived state "A", then this
656659mutation does not result in any change of state. This error is
657660raised at run-time when we reconstruct sample genotypes, for example
658- in the :meth: `. TreeSequence.variants ` iterator.
661+ in the :meth: `TreeSequence.variants ` iterator.
659662
660663.. _sec_migration_requirements :
661664
@@ -708,7 +711,7 @@ Schema section (TODO).
708711Table transformation methods
709712============================
710713
711- In general, table methods operate *in place * on a :class: `. TableCollection `,
714+ In general, table methods operate *in place * on a :class: `TableCollection `,
712715directly altering the data stored within its constituent tables.
713716
714717In some applications, tables may most naturally be produced in a way that is
@@ -718,8 +721,8 @@ below (while also having other uses), can be used to make such a set of tables
718721valid, and thus ready to be loaded into a tree sequence.
719722
720723Some of the other methods described in this section also have an equivalant
721- :class: `. TreeSequence ` version: an important distinction is that unlike the
722- methods here, :class: `. TreeSequence ` methods do *not * operate in place, but
724+ :class: `TreeSequence ` version: an important distinction is that unlike the
725+ methods here, :class: `TreeSequence ` methods do *not * operate in place, but
723726rather act in a functional way, returning a new tree sequence while leaving
724727the original one unchanged.
725728
@@ -732,8 +735,8 @@ Simplification
732735--------------
733736
734737Simplification of a tree sequence is in fact a transformation method applied
735- to the underlying tables: the method :meth: `. TreeSequence.simplify ` calls
736- :meth: `. TableCollection.simplify ` on the tables, and loads a new tree sequence.
738+ to the underlying tables: the method :meth: `TreeSequence.simplify ` calls
739+ :meth: `TableCollection.simplify ` on the tables, and loads a new tree sequence.
737740The main purpose of this method is to remove redundant information,
738741only retaining the minimal tree sequence necessary to describe the genealogical
739742history of the ``samples `` provided.
@@ -743,7 +746,7 @@ Furthermore, ``simplify`` is guaranteed to:
743746- preserve relative ordering of any rows in the Site and Mutation tables
744747 that are not discarded.
745748
746- The :meth: `. TableCollection.simplify ` method can be applied to a collection of
749+ The :meth: `TableCollection.simplify ` method can be applied to a collection of
747750tables that does not have the ``mutations.parent `` entries filled in, as long
748751as all other validity requirements are satisfied.
749752
@@ -754,7 +757,7 @@ Sorting
754757
755758The validity requirements for a set of tables to be loaded into a tree sequence
756759listed in :ref: `sec_table_definitions ` are of two sorts: logical consistency,
757- and sortedness. The :meth: `. TableCollection.sort ` method can be used to make
760+ and sortedness. The :meth: `TableCollection.sort ` method can be used to make
758761completely valid a set of tables that satisfies all requirements other than
759762sortedness.
760763
@@ -765,7 +768,7 @@ be sorted. The method has two additional properties:
765768- it preserves relative ordering between sites at the same position, and
766769- it preserves relative ordering between mutations at the same site.
767770
768- :meth: `. TableCollection.sort ` does not check the validity of the `parent `
771+ :meth: `TableCollection.sort ` does not check the validity of the `parent `
769772property of the mutation table. However, because the method preserves mutation
770773order among mutations at the same site, if mutations are already sorted so that
771774each mutation comes after its parent (e.g., if they are ordered by time of
@@ -786,7 +789,7 @@ the tables must be indexed.
786789Removing duplicate sites
787790------------------------
788791
789- The :meth: `. TableCollection.deduplicate_sites ` method can be used to save a tree
792+ The :meth: `TableCollection.deduplicate_sites ` method can be used to save a tree
790793sequence recording method the bother of checking to see if a given site already
791794exists in the site table. If there is more than one site with the same
792795position, all but the first is removed, and all mutations referring to the
@@ -802,7 +805,7 @@ of the mutation table would be easily inferred from the tree at that mutation's
802805site. If mutations are entered into the mutation table ordered by time of
803806appearance, then this sortedness allows us to infer the parent of each mutation
804807even for mutations occurring on the same branch. The
805- :meth: `. TableCollection.compute_mutation_parents ` method will take advantage
808+ :meth: `TableCollection.compute_mutation_parents ` method will take advantage
806809of this fact to compute the ``parent `` column of a mutation table, if all
807810other information is valid.
808811
@@ -982,8 +985,8 @@ Consider the following example:
982985
983986In this tree, node 4 is isolated, and therefore for any sites that are
984987on this tree, the state that it is assigned is a special value
985- ``tskit.MISSING_DATA ``, or ``-1 ``. See the :meth: `. TreeSequence.variants `
986- method and :class: `. Variant ` class for more information on how missing
988+ ``tskit.MISSING_DATA ``, or ``-1 ``. See the :meth: `TreeSequence.variants `
989+ method and :class: `Variant ` class for more information on how missing
987990data is represented.
988991
989992.. _sec_text_file_format :
0 commit comments