9494: Coalescent with recombination ("hudson")
9595
9696{class}` .SmcKApproxCoalescent `
97- : General Sequentially Markov Coalescent
97+ : General Sequentially Markov Coalescent
9898
9999{class}` .DiscreteTimeWrightFisher `
100100: Generation-by-generation Wright-Fisher
@@ -2230,7 +2230,7 @@ in units of 4N generations.
22302230
22312231### SMC approximations
22322232
2233- The ** SMC** and ** SMC′**
2233+ The ** SMC** and ** SMC′**
22342234are approximations of the continuous time
22352235{ref}` Hudson coalescent<sec_ancestry_models_hudson> ` model. These were originally
22362236motivated largely by the need to simulate coalescent processes more efficiently
@@ -2240,19 +2240,19 @@ mean that such approximations are now unnecessary for many simulations.
22402240
22412241The ** SMC** and ** SMC'** are, however, very important for inference, as the approximations
22422242have made many analytical advances possible. Moreover, using these approximations,
2243- we are able to simulate regimes which we couldn't simulate otherwise: for example,
2243+ we are able to simulate regimes which we couldn't simulate otherwise: for example,
22442244** Drosophila** and ** Drosophila-like** simulations with very high scaled recombination rates.
22452245
22462246
2247- The {class}` SMC(k) <.SmcKApproxCoalescent> ` model is a general simulations model that can simulate various ** SMC** approximations
2248- (e.g., ** SMC** and ** SMC′** ). It accepts a ``` hull_offset ``` parameter, which defines the extent of
2249- ** SMC** approximations in the simulation. The ``` hull_offset ``` represents the maximum allowed
2250- distance between two genomic segments that can share a common ancestor. Setting the
2251- ``` hull_offset ``` to ** 0** means only overlapping genomic segments can share a common ancestor,
2252- corresponding to the backward-in-time definition of the ** SMC** model. Similarly, setting
2253- the ``` hull_offset ``` to ** 1** allows adjacent genomic segments, as well as overlapping ones, to
2254- share a common ancestor, which defines the ** SMC′** model. Simulating under the Hudson
2255- coalescent model is equivalent to setting the ``` hull_offset ``` to the sequence length. The
2247+ The {class}` SMC(k) <.SmcKApproxCoalescent> ` model is a general simulations model that can simulate various ** SMC** approximations
2248+ (e.g., ** SMC** and ** SMC′** ). It accepts a ``` hull_offset ``` parameter, which defines the extent of
2249+ ** SMC** approximations in the simulation. The ``` hull_offset ``` represents the maximum allowed
2250+ distance between two genomic segments that can share a common ancestor. Setting the
2251+ ``` hull_offset ``` to ** 0** means only overlapping genomic segments can share a common ancestor,
2252+ corresponding to the backward-in-time definition of the ** SMC** model. Similarly, setting
2253+ the ``` hull_offset ``` to ** 1** allows adjacent genomic segments, as well as overlapping ones, to
2254+ share a common ancestor, which defines the ** SMC′** model. Simulating under the Hudson
2255+ coalescent model is equivalent to setting the ``` hull_offset ``` to the sequence length. The
22562256hull_offset can take any value between ** 0** and the sequence length.
22572257
22582258In this example, we use the {class}` SMC(k) <.SmcKApproxCoalescent> ` model to run ** SMC'**
@@ -2265,7 +2265,7 @@ SVG(ts.draw_svg(y_axis=True, time_scale="log_time"))
22652265```
22662266:::{Note}
22672267Since the ** SMC** models are approximations of the {ref}` Hudson coalescent<sec_ancestry_models_hudson> ` ,
2268- and since the {ref}` Hudson coalescent<sec_ancestry_models_hudson> ` model is well optimised for
2268+ and since the {ref}` Hudson coalescent<sec_ancestry_models_hudson> ` model is well optimised for
22692269regimes with moderate scaled recombination rates (including full human chromosome simulations),
22702270we recommend using the {ref}` Hudson coalescent<sec_ancestry_models_hudson> ` whenever possible.
22712271:::
@@ -2843,37 +2843,38 @@ combined with other {ref}`ancestry models<sec_ancestry_models>`.
28432843#### Tracing ancestry through a pedigree
28442844
28452845The previous simulation generated trees that are embedded within the pedigree,
2846- but it doesn't tell us directly the exact path that each sample's genetic
2846+ but it doesn't tell us directly the path that each sample's genetic
28472847material took through its ancestors.
28482848
28492849To trace these paths exactly, we can
28502850use a combination of `` additional_nodes `` ,
28512851`` coalescing_segments_only ``
28522852(see {ref}` sec_ancestry_additional_nodes ` ),
2853- and stop_at_local_mrca (see {ref}` sec_ancestry_stop_at_local_mrca ` ). In this
2854- example, we show how to trace the exact path that each sample's genetic material
2855- took through its ancestors.
2853+ and stop_at_local_mrca (see {ref}` sec_ancestry_stop_at_local_mrca ` ).
2854+ For this example, we create another contrived pedigree with a single
2855+ proband, and 3 generations of diploid ancestors:
28562856
28572857``` {code-cell}
2858-
2859- import io
2860- import tskit
28612858ped_txt = """\
28622859# id parent0 parent1 time is_sample
2863- 0 2 3 0.0 1
2864- 1 2 2 0 .0 1
2865- 2 4 5 1.0 0
2866- 3 5 5 1 .0 0
2867- 4 . . 2.0 0
2868- 5 6 . 2 .0 0
2860+ 0 1 2 0.0 1
2861+ 1 3 4 1 .0 0
2862+ 2 3 4 1.0 0
2863+ 3 5 6 2 .0 0
2864+ 4 5 6 2.0 0
2865+ 5 . . 3 .0 0
286928666 . . 3.0 0
28702867"""
28712868pedigree = msprime.parse_pedigree(io.StringIO(ped_txt), sequence_length=100)
2872-
2869+ draw_pedigree(pedigree.tree_sequence())
2870+ ```
2871+ We can then run a simuation through this pedigree. For simplicity here,
2872+ we don't have any recombination:
2873+ ``` {code-cell}
28732874ped_ts = msprime.sim_ancestry(
28742875 initial_state=pedigree,
28752876 model="fixed_pedigree",
2876- random_seed=3 ,
2877+ random_seed=998 ,
28772878 coalescing_segments_only=False,
28782879 additional_nodes=(
28792880 msprime.NodeType.PASS_THROUGH),
@@ -2882,13 +2883,10 @@ ped_ts = msprime.sim_ancestry(
28822883node_labels = {node.id: f"{node.individual}({node.id})" for node in ped_ts.nodes()}
28832884SVG(ped_ts.draw_svg(y_axis=True, node_labels=node_labels, size=(600,200)))
28842885```
2885- Here, we simulated a simple pedigree of two diploid individuals across three generations.
2886-
2887- In this figure, we can see that the ancestor of node 5 is node 6. We can also see that part of the genetic
2888- material of node 0 was inherited from node 5 through node 3. Both observations would
2889- have had to be inferred otherwise.
2890-
2891-
2886+ Now we can see that the two genomes of individual 0 (nodes 0 and 1)
2887+ reach their common ancestor node 9 of individual 4. Because we
2888+ have set `` stop_at_local_mrca=False `` , we can now also see how
2889+ this ancestral material goes further back in the pedigree.
28922890
28932891#### Censoring pedigree information
28942892
0 commit comments