Conversation
| at 300 generations ago, there were three extant genomes | ||
| from which the samples inherited, and the inherited segments are | ||
| as listed here. | ||
| Note that this does not mean that "node 2 was laive 300 generations ago" |
| (clearly, as node 2 represetns an extant, sampled genome), | ||
| but rather that there are no other ancestral genomes recorded explicitly | ||
| in the tree sequence that lie on the path along | ||
| which node 2 has inherited it's genome. |
|
Nice. I like this. Note that there are a few examples of iterating up and down the graph at https://tskit.dev/tutorials/args.html#graph-traversal, but I don't actually do anything with the traversals, so your examples are better. Also note that some of stuff might also link in to tskit-dev/tskit#2869, and there are some suggestions of things you might want to calculate there. One thing that is much easier to do compared to the tree-by-tree approach is to find all the descendant samples of a particular ancestral node (or alternatively, all the internal nodes that are ancestors of a particular sample). |
jeromekelleher
left a comment
There was a problem hiding this comment.
I think it's great, a really helpful addition. I've a few minor take-or-leave implementation suggestions.
|
|
||
| ```{code-cell} ipython3 | ||
| for e in ts.edges(): | ||
| t = ts.node(e.parent).time |
There was a problem hiding this comment.
Maybe use ts.nodes_time[e.parent] so that that this is more easily translatable to numba?
| from which the samples inherited, and the inherited segments are | ||
| as listed here. | ||
| Note that this does not mean that "node 2 was laive 300 generations ago" | ||
| (clearly, as node 2 represetns an extant, sampled genome), |
There was a problem hiding this comment.
| (clearly, as node 2 represetns an extant, sampled genome), | |
| (clearly, as node 2 represents an extant, sampled genome), |
| Here is a data structure for a list of segments with labels: | ||
|
|
||
| ```{code-cell} ipython3 | ||
| class LabelSegmentList: |
There was a problem hiding this comment.
It might be worth considering inheriting from collections.abc.MutableSequence here, as this would give you all the dunder methods that you're implementing. I think these might be a bit scary to non-python people, and are a bit of a distraction from the main point.
There was a problem hiding this comment.
I guess you'd need to inherit from list then to get the actual storage. Maybe that's OK?
| Now, edges in the EdgeTable are sorted by parent time, | ||
| so if we iterate through the edges in order, we move back in time. | ||
| So, we can use this to see the state of the process at, say, | ||
| 500 generations in the past: |
|
Is this changed at all by the new ARG interface in Tskit 1.0? |
agladstein
left a comment
There was a problem hiding this comment.
I happened upon this old PR when I was looking around for some docs/examples of something and thought I'd leave some minor clean up comments.
It wasn't an example of what I was looking for, but seems worth merging. One more general comment is that I think it could use a little wrap paragraph, reiterating what we accomplished in this example and more hints as to what traversing backwards can be useful for.
| A tree sequence provides an encoding of how segments of genome are inherited. | ||
| For some purposes, it is most helpful to iterate along the genome, looking | ||
| sequentially at each of the genealogical trees implied by this pattern of inheritance. | ||
| However, the data structure itself was not really designed for this purpose: |
There was a problem hiding this comment.
Nit: This wording on this sentence confuses me. What is "this purpose" and "it" - looking sequentially at trees along the genome, or back through time?
| However, the data structure itself was not really designed for this purpose: | ||
| it naturally arose from the perspective of looking back through time, to see | ||
| how genomes were inherited from each other (in other words, the *coalescent* perspective). | ||
| This tutorial demonstrates how to use the information in the tree sequence to see |
There was a problem hiding this comment.
Nit: I think this sentence would be helpful as the first sentence of the tutorial.
| how these inherited segments of ancestry change as one moves through time. | ||
|
|
||
| To do this, it will be helpful define a simple class to represent a collection | ||
| of non-overlapping intervals. Each ancestral lineage will have an associated |
There was a problem hiding this comment.
Nit: "non-overlapping intervals [along the genome]"?
non-overlapping intervals over what? along the genome or over time?
| ts.draw_svg(size=(400, 200), y_axis=True, time_scale='rank') | ||
| ``` | ||
|
|
||
| What we will do is to keep track of ancestrally inherited segments |
There was a problem hiding this comment.
Nit: It was a little confusing as written
| What we will do is to keep track of ancestrally inherited segments | |
| We will keep track of ancestrally inherited segments |
| at a particular point in time. | ||
| An edge represents a sequence of ancestral genomes along which | ||
| a given segment was inherited. | ||
| (Anoter interpretation would be that an edge represents |
There was a problem hiding this comment.
Typo
| (Anoter interpretation would be that an edge represents | |
| (Another interpretation would be that an edge represents |
In working out an algorithm that wants to move back through time, it seemed helpful to do a simple explainer on iterating back in time - i.e., taking the haplotype view instead of the tree-by-tree view.
This is a draft of that. Suggestions for nicer python or fun examples welcome! So far it's not demonstrating anything that you couldn't do tree-by-tree.