Skip to content

Backwards#275

Open
petrelharp wants to merge 2 commits intotskit-dev:mainfrom
petrelharp:backwards
Open

Backwards#275
petrelharp wants to merge 2 commits intotskit-dev:mainfrom
petrelharp:backwards

Conversation

@petrelharp
Copy link
Contributor

In working out an algorithm that wants to move back through time, it seemed helpful to do a simple explainer on iterating back in time - i.e., taking the haplotype view instead of the tree-by-tree view.

This is a draft of that. Suggestions for nicer python or fun examples welcome! So far it's not demonstrating anything that you couldn't do tree-by-tree.

at 300 generations ago, there were three extant genomes
from which the samples inherited, and the inherited segments are
as listed here.
Note that this does not mean that "node 2 was laive 300 generations ago"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "alive"

(clearly, as node 2 represetns an extant, sampled genome),
but rather that there are no other ancestral genomes recorded explicitly
in the tree sequence that lie on the path along
which node 2 has inherited it's genome.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "it's"

@hyanwong
Copy link
Member

hyanwong commented Jul 1, 2024

Nice. I like this. Note that there are a few examples of iterating up and down the graph at https://tskit.dev/tutorials/args.html#graph-traversal, but I don't actually do anything with the traversals, so your examples are better.

Also note that some of stuff might also link in to tskit-dev/tskit#2869, and there are some suggestions of things you might want to calculate there. One thing that is much easier to do compared to the tree-by-tree approach is to find all the descendant samples of a particular ancestral node (or alternatively, all the internal nodes that are ancestors of a particular sample).

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great, a really helpful addition. I've a few minor take-or-leave implementation suggestions.


```{code-cell} ipython3
for e in ts.edges():
t = ts.node(e.parent).time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ts.nodes_time[e.parent] so that that this is more easily translatable to numba?

from which the samples inherited, and the inherited segments are
as listed here.
Note that this does not mean that "node 2 was laive 300 generations ago"
(clearly, as node 2 represetns an extant, sampled genome),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(clearly, as node 2 represetns an extant, sampled genome),
(clearly, as node 2 represents an extant, sampled genome),

Here is a data structure for a list of segments with labels:

```{code-cell} ipython3
class LabelSegmentList:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth considering inheriting from collections.abc.MutableSequence here, as this would give you all the dunder methods that you're implementing. I think these might be a bit scary to non-python people, and are a bit of a distraction from the main point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you'd need to inherit from list then to get the actual storage. Maybe that's OK?

Now, edges in the EdgeTable are sorted by parent time,
so if we iterate through the edges in order, we move back in time.
So, we can use this to see the state of the process at, say,
500 generations in the past:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, 500 should be 300

@hyanwong
Copy link
Member

Is this changed at all by the new ARG interface in Tskit 1.0?

Copy link
Member

@agladstein agladstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened upon this old PR when I was looking around for some docs/examples of something and thought I'd leave some minor clean up comments.
It wasn't an example of what I was looking for, but seems worth merging. One more general comment is that I think it could use a little wrap paragraph, reiterating what we accomplished in this example and more hints as to what traversing backwards can be useful for.

A tree sequence provides an encoding of how segments of genome are inherited.
For some purposes, it is most helpful to iterate along the genome, looking
sequentially at each of the genealogical trees implied by this pattern of inheritance.
However, the data structure itself was not really designed for this purpose:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This wording on this sentence confuses me. What is "this purpose" and "it" - looking sequentially at trees along the genome, or back through time?

However, the data structure itself was not really designed for this purpose:
it naturally arose from the perspective of looking back through time, to see
how genomes were inherited from each other (in other words, the *coalescent* perspective).
This tutorial demonstrates how to use the information in the tree sequence to see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this sentence would be helpful as the first sentence of the tutorial.

how these inherited segments of ancestry change as one moves through time.

To do this, it will be helpful define a simple class to represent a collection
of non-overlapping intervals. Each ancestral lineage will have an associated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "non-overlapping intervals [along the genome]"?
non-overlapping intervals over what? along the genome or over time?

ts.draw_svg(size=(400, 200), y_axis=True, time_scale='rank')
```

What we will do is to keep track of ancestrally inherited segments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It was a little confusing as written

Suggested change
What we will do is to keep track of ancestrally inherited segments
We will keep track of ancestrally inherited segments

at a particular point in time.
An edge represents a sequence of ancestral genomes along which
a given segment was inherited.
(Anoter interpretation would be that an edge represents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
(Anoter interpretation would be that an edge represents
(Another interpretation would be that an edge represents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants