Skip to content

Commit a12ef78

Browse files
jeromekellehermergify[bot]
authored andcommitted
Documentation for IBD segments
Closes #2013 Closes #1716 Closes #1682 Closes #1657
1 parent dd413af commit a12ef78

File tree

6 files changed

+380
-76
lines changed

6 files changed

+380
-76
lines changed

docs/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ parts:
1313
chapters:
1414
- file: stats
1515
- file: topological-analysis
16+
- file: ibd
1617
- caption: Interfaces
1718
chapters:
1819
- file: python-api

docs/ibd.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.12
7+
jupytext_version: 1.9.1
8+
kernelspec:
9+
display_name: Python 3
10+
language: python
11+
name: python3
12+
---
13+
14+
```{currentmodule} tskit
15+
```
16+
17+
18+
(sec_identity)=
19+
20+
# Identity by descent
21+
22+
The {meth}`.TreeSequence.ibd_segments` allows us to compute
23+
segments of identity by descent.
24+
25+
:::{note}
26+
This documentation page is preliminary
27+
:::
28+
29+
## Examples
30+
31+
Let's take a simple tree sequence to illustrate the {meth}`.TreeSequence.ibd_segments`
32+
method and associated {ref}`sec_python_api_reference_identity`:
33+
34+
```{code-cell}
35+
:tags: [hide-input]
36+
37+
import tskit
38+
import io
39+
from IPython.display import SVG
40+
41+
nodes = io.StringIO(
42+
"""\
43+
id is_sample time
44+
0 1 0
45+
1 1 0
46+
2 1 0
47+
3 0 1
48+
4 0 2
49+
5 0 3
50+
"""
51+
)
52+
edges = io.StringIO(
53+
"""\
54+
left right parent child
55+
2 10 3 0
56+
2 10 3 2
57+
0 10 4 1
58+
0 2 4 2
59+
2 10 4 3
60+
0 2 5 0
61+
0 2 5 4
62+
"""
63+
)
64+
ts = tskit.load_text(nodes=nodes, edges=edges, strict=False)
65+
66+
SVG(ts.draw_svg())
67+
```
68+
69+
### Definition
70+
71+
A pair of nodes ``(u, v)`` has an IBD segment with a left and right
72+
coordinate ``[left, right)`` and ancestral node ``a`` iff the most
73+
recent common ancestor of the segment ``[left, right)`` in nodes ``u``
74+
and ``v`` is ``a``, and the segment has been inherited along the same
75+
genealogical path (ie. it has not been broken by recombination). The
76+
segments returned are the longest possible ones.
77+
78+
Consider the IBD segments that we get from our example tree sequence:
79+
80+
```{code-cell}
81+
segs = ts.ibd_segments(store_segments=True)
82+
for pair, segment_list in segs.items():
83+
print(pair, list(segment_list))
84+
```
85+
86+
Each of the sample pairs (0, 1), (0, 2) and (1, 2) is associated with
87+
two IBD segments, representing the different paths from these sample
88+
pairs to their common ancestor. Note in particular that (1, 2) has
89+
**two** IBD segments rather than one: even though the MRCA is
90+
4 in both cases, the paths from the samples to the MRCA are different
91+
in the left and right trees.
92+
93+
94+
### Data structures
95+
96+
The result of calling {meth}`.TreeSequence.ibd_segments` is an
97+
{class}`.IdentitySegments` class:
98+
99+
```{code-cell}
100+
segs = ts.ibd_segments()
101+
print(segs)
102+
```
103+
104+
By default this class only stores the high-level summaries of the
105+
IBD segments discovered. As we can see in this example, we have a
106+
total of six segments and
107+
the total span (i.e., the sum lengths of the genomic intervals spanned
108+
by IBD segments) is 30.
109+
110+
If required, we can get more detailed information about particular
111+
segment pairs and the actual segments using the ``store_pairs``
112+
and ``store_segments`` arguments.
113+
114+
:::{warning}
115+
Only use the ``store_pairs`` and ``store_segments`` arguments if you
116+
really need this information! The number of IBD segments can be
117+
very large and storing them all requires a lot of memory. It is
118+
also much faster to just compute the overall summaries, without
119+
needing to store the actual lists.
120+
:::
121+
122+
123+
```{code-cell}
124+
segs = ts.ibd_segments(store_pairs=True)
125+
for pair, value in segs.items():
126+
print(pair, "::", value)
127+
```
128+
129+
Now we can see the more detailed breakdown of how the identity segments
130+
are distributed among the sample pairs. The {class}`.IdentitySegments`
131+
class behaves like a dictionary, such that ``segs[(a, b)]`` will return
132+
the {class}`.IdentitySegmentList` instance for that pair of samples:
133+
134+
```{code-cell}
135+
seglist = segs[(0, 1)]
136+
print(seglist)
137+
```
138+
139+
If we want to access the detailed information about the actual
140+
identity segments, we must use the ``store_segments`` argument:
141+
142+
```{code-cell}
143+
segs = ts.ibd_segments(store_pairs=True, store_segments=True)
144+
segs[(0, 1)]
145+
```
146+
147+
The {class}`.IdentitySegmentList` behaves like a Python list,
148+
where each element is an instance of {class}`.IdentitySegment`.
149+
150+
:::{warning}
151+
The order of segments in an {class}`.IdentitySegmentList`
152+
is arbitrary, and may change in future versions.
153+
:::
154+
155+
156+
```{eval-rst}
157+
.. todo:: More examples using the other bits of the IdentitySegments
158+
API here
159+
```
160+
161+
### Controlling the sample sets
162+
163+
By default we get the IBD segments between all pairs of
164+
:ref:`sample<sec_data_model_definitions_samples>` nodes.
165+
166+
#### IBD within a sample set
167+
We can reduce this to pairs within a specific set using the
168+
``within`` argument::
169+
170+
171+
```{eval-rst}
172+
.. todo:: More detail and better examples here.
173+
```
174+
175+
```{code-cell}
176+
segs = ts.ibd_segments(within=[0, 2], store_pairs=True)
177+
print(list(segs.keys()))
178+
```
179+
180+
#### IBD between sample sets
181+
182+
We can also compute IBD **between** sample sets:
183+
184+
```{code-cell}
185+
segs = ts.ibd_segments(between=[[0,1], [2]], store_pairs=True)
186+
print(list(segs.keys()))
187+
```
188+
189+
:::{seealso}
190+
See the {meth}`.TreeSequence.ibd_segments` documentation for
191+
more details.
192+
:::
193+
194+
### Constraints on the segments
195+
196+
The ``max_time`` and ``min_length`` arguments allow us to constrain the
197+
segments that we consider.
198+
199+
```{eval-rst}
200+
.. todo:: Add examples for these arguments.
201+
```

docs/python-api.md

Lines changed: 44 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,20 @@ which perform the same actions but modify the {class}`TableCollection` in place.
220220
TreeSequence.trim
221221
```
222222

223+
(sec_python_api_tree_sequences_ibd)=
224+
225+
#### Identity by descent
226+
227+
The {meth}`.TreeSequence.ibd_segments` method allows us to compute
228+
identity relationships between pairs of samples. See the
229+
{ref}`sec_identity` section for more details and examples
230+
and the {ref}`sec_python_api_reference_identity` section for
231+
API documentation on the associated classes.
232+
233+
```{eval-rst}
234+
.. autosummary::
235+
TreeSequence.ibd_segments
236+
```
223237

224238
(sec_python_api_tree_sequences_tables)=
225239

@@ -471,7 +485,7 @@ high performance interface which can be used in conjunction with the equivalent
471485

472486
Moving around within a tree usually involves visiting the tree nodes in some sort of
473487
order. Often, given a particular order, it is convenient to iterate over each node
474-
using the :meth:`Tree.nodes` method. However, for high performance algorithms, it
488+
using the {meth}`Tree.nodes` method. However, for high performance algorithms, it
475489
may be more convenient to access the node indices for a particular order as
476490
an array, and use this, for example, to index into one of the node arrays (see
477491
{ref}`sec_topological_analysis_traversal`).
@@ -646,7 +660,7 @@ Other properties
646660
These methods act in-place to transform the contents of a {class}`TableCollection`,
647661
either by modifying the underlying tables (removing, editing, or adding to them) or
648662
by adjusting the table collection so that it meets the
649-
{ref}`sec_valid_tree_sequence_requirements.
663+
{ref}`sec_valid_tree_sequence_requirements`.
650664

651665

652666
(sec_tables_api_modification)=
@@ -707,14 +721,12 @@ Indexing
707721
TableCollection.drop_index
708722
```
709723

710-
711724
#### Miscellaneous methods
712725

713726
```{eval-rst}
714727
.. autosummary::
715728
TableCollection.copy
716729
TableCollection.equals
717-
TableCollection.ibd_segments
718730
TableCollection.link_ancestors
719731
```
720732

@@ -1413,6 +1425,34 @@ basic class, where each attribute matches an identically named attribute in the
14131425
:inherited-members:
14141426
```
14151427

1428+
(sec_python_api_reference_identity)=
1429+
1430+
### Identity classes
1431+
1432+
The classes documented in this section are associated with summarising
1433+
identity relationships between pairs of samples. See the {ref}`sec_identity`
1434+
section for more details and examples.
1435+
1436+
#### The {class}`IdentitySegments` class
1437+
1438+
```{eval-rst}
1439+
.. autoclass:: IdentitySegments()
1440+
:members:
1441+
```
1442+
1443+
#### The {class}`IdentitySegmentList` class
1444+
1445+
```{eval-rst}
1446+
.. autoclass:: IdentitySegmentList()
1447+
:members:
1448+
```
1449+
1450+
#### The {class}`IdentitySegment` class
1451+
1452+
```{eval-rst}
1453+
.. autoclass:: IdentitySegment()
1454+
:members:
1455+
```
14161456

14171457
### Miscellaneous classes
14181458

python/CHANGELOG.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@
5555

5656
**Features**
5757

58+
- Add the ``ibd_segments`` method and associated classes to compute, summarise
59+
and store segments of identity by descent from a tree sequence
60+
(:user:`gtsambos`, :user:`jeromekelleher`).
61+
5862
- Allow skipping of site and mutation tables in ``TableCollection.sort``
5963
(:user:`benjeffery`, :issue:`1475`, :pr:`1826`).
6064

@@ -114,6 +118,7 @@
114118

115119
- tskit now supports python 3.10 (:user:`benjeffery`, :issue:`1895`, :pr:`1949`)
116120

121+
117122
**Fixes**
118123

119124
- `dump_tables` omitted individual parents. (:user:`benjeffery`, :issue:`1828`, :pr:`1884`)

0 commit comments

Comments
 (0)