|
| 1 | +--- |
| 2 | +jupytext: |
| 3 | + text_representation: |
| 4 | + extension: .md |
| 5 | + format_name: myst |
| 6 | + format_version: 0.12 |
| 7 | + jupytext_version: 1.9.1 |
| 8 | +kernelspec: |
| 9 | + display_name: Python 3 |
| 10 | + language: python |
| 11 | + name: python3 |
| 12 | +--- |
| 13 | + |
| 14 | +```{currentmodule} tskit |
| 15 | +``` |
| 16 | + |
| 17 | + |
| 18 | +(sec_identity)= |
| 19 | + |
| 20 | +# Identity by descent |
| 21 | + |
| 22 | +The {meth}`.TreeSequence.ibd_segments` allows us to compute |
| 23 | +segments of identity by descent. |
| 24 | + |
| 25 | +:::{note} |
| 26 | +This documentation page is preliminary |
| 27 | +::: |
| 28 | + |
| 29 | +## Examples |
| 30 | + |
| 31 | +Let's take a simple tree sequence to illustrate the {meth}`.TreeSequence.ibd_segments` |
| 32 | +method and associated {ref}`sec_python_api_reference_identity`: |
| 33 | + |
| 34 | +```{code-cell} |
| 35 | +:tags: [hide-input] |
| 36 | +
|
| 37 | +import tskit |
| 38 | +import io |
| 39 | +from IPython.display import SVG |
| 40 | +
|
| 41 | +nodes = io.StringIO( |
| 42 | + """\ |
| 43 | + id is_sample time |
| 44 | + 0 1 0 |
| 45 | + 1 1 0 |
| 46 | + 2 1 0 |
| 47 | + 3 0 1 |
| 48 | + 4 0 2 |
| 49 | + 5 0 3 |
| 50 | + """ |
| 51 | +) |
| 52 | +edges = io.StringIO( |
| 53 | + """\ |
| 54 | + left right parent child |
| 55 | + 2 10 3 0 |
| 56 | + 2 10 3 2 |
| 57 | + 0 10 4 1 |
| 58 | + 0 2 4 2 |
| 59 | + 2 10 4 3 |
| 60 | + 0 2 5 0 |
| 61 | + 0 2 5 4 |
| 62 | + """ |
| 63 | +) |
| 64 | +ts = tskit.load_text(nodes=nodes, edges=edges, strict=False) |
| 65 | +
|
| 66 | +SVG(ts.draw_svg()) |
| 67 | +``` |
| 68 | + |
| 69 | +### Definition |
| 70 | + |
| 71 | +A pair of nodes ``(u, v)`` has an IBD segment with a left and right |
| 72 | +coordinate ``[left, right)`` and ancestral node ``a`` iff the most |
| 73 | +recent common ancestor of the segment ``[left, right)`` in nodes ``u`` |
| 74 | +and ``v`` is ``a``, and the segment has been inherited along the same |
| 75 | +genealogical path (ie. it has not been broken by recombination). The |
| 76 | +segments returned are the longest possible ones. |
| 77 | + |
| 78 | +Consider the IBD segments that we get from our example tree sequence: |
| 79 | + |
| 80 | +```{code-cell} |
| 81 | +segs = ts.ibd_segments(store_segments=True) |
| 82 | +for pair, segment_list in segs.items(): |
| 83 | + print(pair, list(segment_list)) |
| 84 | +``` |
| 85 | + |
| 86 | +Each of the sample pairs (0, 1), (0, 2) and (1, 2) is associated with |
| 87 | +two IBD segments, representing the different paths from these sample |
| 88 | +pairs to their common ancestor. Note in particular that (1, 2) has |
| 89 | +**two** IBD segments rather than one: even though the MRCA is |
| 90 | +4 in both cases, the paths from the samples to the MRCA are different |
| 91 | +in the left and right trees. |
| 92 | + |
| 93 | + |
| 94 | +### Data structures |
| 95 | + |
| 96 | +The result of calling {meth}`.TreeSequence.ibd_segments` is an |
| 97 | +{class}`.IdentitySegments` class: |
| 98 | + |
| 99 | +```{code-cell} |
| 100 | +segs = ts.ibd_segments() |
| 101 | +print(segs) |
| 102 | +``` |
| 103 | + |
| 104 | +By default this class only stores the high-level summaries of the |
| 105 | +IBD segments discovered. As we can see in this example, we have a |
| 106 | +total of six segments and |
| 107 | +the total span (i.e., the sum lengths of the genomic intervals spanned |
| 108 | +by IBD segments) is 30. |
| 109 | + |
| 110 | +If required, we can get more detailed information about particular |
| 111 | +segment pairs and the actual segments using the ``store_pairs`` |
| 112 | +and ``store_segments`` arguments. |
| 113 | + |
| 114 | +:::{warning} |
| 115 | +Only use the ``store_pairs`` and ``store_segments`` arguments if you |
| 116 | +really need this information! The number of IBD segments can be |
| 117 | +very large and storing them all requires a lot of memory. It is |
| 118 | +also much faster to just compute the overall summaries, without |
| 119 | +needing to store the actual lists. |
| 120 | +::: |
| 121 | + |
| 122 | + |
| 123 | +```{code-cell} |
| 124 | +segs = ts.ibd_segments(store_pairs=True) |
| 125 | +for pair, value in segs.items(): |
| 126 | + print(pair, "::", value) |
| 127 | +``` |
| 128 | + |
| 129 | +Now we can see the more detailed breakdown of how the identity segments |
| 130 | +are distributed among the sample pairs. The {class}`.IdentitySegments` |
| 131 | +class behaves like a dictionary, such that ``segs[(a, b)]`` will return |
| 132 | +the {class}`.IdentitySegmentList` instance for that pair of samples: |
| 133 | + |
| 134 | +```{code-cell} |
| 135 | +seglist = segs[(0, 1)] |
| 136 | +print(seglist) |
| 137 | +``` |
| 138 | + |
| 139 | +If we want to access the detailed information about the actual |
| 140 | +identity segments, we must use the ``store_segments`` argument: |
| 141 | + |
| 142 | +```{code-cell} |
| 143 | +segs = ts.ibd_segments(store_pairs=True, store_segments=True) |
| 144 | +segs[(0, 1)] |
| 145 | +``` |
| 146 | + |
| 147 | +The {class}`.IdentitySegmentList` behaves like a Python list, |
| 148 | +where each element is an instance of {class}`.IdentitySegment`. |
| 149 | + |
| 150 | +:::{warning} |
| 151 | +The order of segments in an {class}`.IdentitySegmentList` |
| 152 | +is arbitrary, and may change in future versions. |
| 153 | +::: |
| 154 | + |
| 155 | + |
| 156 | +```{eval-rst} |
| 157 | +.. todo:: More examples using the other bits of the IdentitySegments |
| 158 | + API here |
| 159 | +``` |
| 160 | + |
| 161 | +### Controlling the sample sets |
| 162 | + |
| 163 | +By default we get the IBD segments between all pairs of |
| 164 | +:ref:`sample<sec_data_model_definitions_samples>` nodes. |
| 165 | + |
| 166 | +#### IBD within a sample set |
| 167 | +We can reduce this to pairs within a specific set using the |
| 168 | +``within`` argument:: |
| 169 | + |
| 170 | + |
| 171 | +```{eval-rst} |
| 172 | +.. todo:: More detail and better examples here. |
| 173 | +``` |
| 174 | + |
| 175 | +```{code-cell} |
| 176 | +segs = ts.ibd_segments(within=[0, 2], store_pairs=True) |
| 177 | +print(list(segs.keys())) |
| 178 | +``` |
| 179 | + |
| 180 | +#### IBD between sample sets |
| 181 | + |
| 182 | +We can also compute IBD **between** sample sets: |
| 183 | + |
| 184 | +```{code-cell} |
| 185 | +segs = ts.ibd_segments(between=[[0,1], [2]], store_pairs=True) |
| 186 | +print(list(segs.keys())) |
| 187 | +``` |
| 188 | + |
| 189 | +:::{seealso} |
| 190 | +See the {meth}`.TreeSequence.ibd_segments` documentation for |
| 191 | +more details. |
| 192 | +::: |
| 193 | + |
| 194 | +### Constraints on the segments |
| 195 | + |
| 196 | +The ``max_time`` and ``min_length`` arguments allow us to constrain the |
| 197 | +segments that we consider. |
| 198 | + |
| 199 | +```{eval-rst} |
| 200 | +.. todo:: Add examples for these arguments. |
| 201 | +``` |
0 commit comments