Skip to content

Commit 12b3951

Browse files
Change example to single sample
1 parent 3ec8a19 commit 12b3951

File tree

1 file changed

+33
-35
lines changed

1 file changed

+33
-35
lines changed

docs/ancestry.md

Lines changed: 33 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ this.
9494
: Coalescent with recombination ("hudson")
9595

9696
{class}`.SmcKApproxCoalescent`
97-
: General Sequentially Markov Coalescent
97+
: General Sequentially Markov Coalescent
9898

9999
{class}`.DiscreteTimeWrightFisher`
100100
: Generation-by-generation Wright-Fisher
@@ -2230,7 +2230,7 @@ in units of 4N generations.
22302230

22312231
### SMC approximations
22322232

2233-
The **SMC** and **SMC′**
2233+
The **SMC** and **SMC′**
22342234
are approximations of the continuous time
22352235
{ref}`Hudson coalescent<sec_ancestry_models_hudson>` model. These were originally
22362236
motivated largely by the need to simulate coalescent processes more efficiently
@@ -2240,19 +2240,19 @@ mean that such approximations are now unnecessary for many simulations.
22402240

22412241
The **SMC** and **SMC'** are, however, very important for inference, as the approximations
22422242
have made many analytical advances possible. Moreover, using these approximations,
2243-
we are able to simulate regimes which we couldn't simulate otherwise: for example,
2243+
we are able to simulate regimes which we couldn't simulate otherwise: for example,
22442244
**Drosophila** and **Drosophila-like** simulations with very high scaled recombination rates.
22452245

22462246

2247-
The {class}`SMC(k) <.SmcKApproxCoalescent>` model is a general simulations model that can simulate various **SMC** approximations
2248-
(e.g., **SMC** and **SMC′**). It accepts a ```hull_offset``` parameter, which defines the extent of
2249-
**SMC** approximations in the simulation. The ```hull_offset``` represents the maximum allowed
2250-
distance between two genomic segments that can share a common ancestor. Setting the
2251-
```hull_offset``` to **0** means only overlapping genomic segments can share a common ancestor,
2252-
corresponding to the backward-in-time definition of the **SMC** model. Similarly, setting
2253-
the ```hull_offset``` to **1** allows adjacent genomic segments, as well as overlapping ones, to
2254-
share a common ancestor, which defines the **SMC′** model. Simulating under the Hudson
2255-
coalescent model is equivalent to setting the ```hull_offset``` to the sequence length. The
2247+
The {class}`SMC(k) <.SmcKApproxCoalescent>` model is a general simulations model that can simulate various **SMC** approximations
2248+
(e.g., **SMC** and **SMC′**). It accepts a ```hull_offset``` parameter, which defines the extent of
2249+
**SMC** approximations in the simulation. The ```hull_offset``` represents the maximum allowed
2250+
distance between two genomic segments that can share a common ancestor. Setting the
2251+
```hull_offset``` to **0** means only overlapping genomic segments can share a common ancestor,
2252+
corresponding to the backward-in-time definition of the **SMC** model. Similarly, setting
2253+
the ```hull_offset``` to **1** allows adjacent genomic segments, as well as overlapping ones, to
2254+
share a common ancestor, which defines the **SMC′** model. Simulating under the Hudson
2255+
coalescent model is equivalent to setting the ```hull_offset``` to the sequence length. The
22562256
hull_offset can take any value between **0** and the sequence length.
22572257

22582258
In this example, we use the {class}`SMC(k) <.SmcKApproxCoalescent>` model to run **SMC'**
@@ -2265,7 +2265,7 @@ SVG(ts.draw_svg(y_axis=True, time_scale="log_time"))
22652265
```
22662266
:::{Note}
22672267
Since the **SMC** models are approximations of the {ref}`Hudson coalescent<sec_ancestry_models_hudson>`,
2268-
and since the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` model is well optimised for
2268+
and since the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` model is well optimised for
22692269
regimes with moderate scaled recombination rates (including full human chromosome simulations),
22702270
we recommend using the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` whenever possible.
22712271
:::
@@ -2843,37 +2843,38 @@ combined with other {ref}`ancestry models<sec_ancestry_models>`.
28432843
#### Tracing ancestry through a pedigree
28442844

28452845
The previous simulation generated trees that are embedded within the pedigree,
2846-
but it doesn't tell us directly the exact path that each sample's genetic
2846+
but it doesn't tell us directly the path that each sample's genetic
28472847
material took through its ancestors.
28482848

28492849
To trace these paths exactly, we can
28502850
use a combination of ``additional_nodes``,
28512851
``coalescing_segments_only``
28522852
(see {ref}`sec_ancestry_additional_nodes`),
2853-
and stop_at_local_mrca (see {ref}`sec_ancestry_stop_at_local_mrca`). In this
2854-
example, we show how to trace the exact path that each sample's genetic material
2855-
took through its ancestors.
2853+
and stop_at_local_mrca (see {ref}`sec_ancestry_stop_at_local_mrca`).
2854+
For this example, we create another contrived pedigree with a single
2855+
proband, and 3 generations of diploid ancestors:
28562856

28572857
```{code-cell}
2858-
2859-
import io
2860-
import tskit
28612858
ped_txt = """\
28622859
# id parent0 parent1 time is_sample
2863-
0 2 3 0.0 1
2864-
1 2 2 0.0 1
2865-
2 4 5 1.0 0
2866-
3 5 5 1.0 0
2867-
4 . . 2.0 0
2868-
5 6 . 2.0 0
2860+
0 1 2 0.0 1
2861+
1 3 4 1.0 0
2862+
2 3 4 1.0 0
2863+
3 5 6 2.0 0
2864+
4 5 6 2.0 0
2865+
5 . . 3.0 0
28692866
6 . . 3.0 0
28702867
"""
28712868
pedigree = msprime.parse_pedigree(io.StringIO(ped_txt), sequence_length=100)
2872-
2869+
draw_pedigree(pedigree.tree_sequence())
2870+
```
2871+
We can then run a simuation through this pedigree. For simplicity here,
2872+
we don't have any recombination:
2873+
```{code-cell}
28732874
ped_ts = msprime.sim_ancestry(
28742875
initial_state=pedigree,
28752876
model="fixed_pedigree",
2876-
random_seed=3,
2877+
random_seed=998,
28772878
coalescing_segments_only=False,
28782879
additional_nodes=(
28792880
msprime.NodeType.PASS_THROUGH),
@@ -2882,13 +2883,10 @@ ped_ts = msprime.sim_ancestry(
28822883
node_labels = {node.id: f"{node.individual}({node.id})" for node in ped_ts.nodes()}
28832884
SVG(ped_ts.draw_svg(y_axis=True, node_labels=node_labels, size=(600,200)))
28842885
```
2885-
Here, we simulated a simple pedigree of two diploid individuals across three generations.
2886-
2887-
In this figure, we can see that the ancestor of node 5 is node 6. We can also see that part of the genetic
2888-
material of node 0 was inherited from node 5 through node 3. Both observations would
2889-
have had to be inferred otherwise.
2890-
2891-
2886+
Now we can see that the two genomes of individual 0 (nodes 0 and 1)
2887+
reach their common ancestor node 9 of individual 4. Because we
2888+
have set ``stop_at_local_mrca=False``, we can now also see how
2889+
this ancestral material goes further back in the pedigree.
28922890

28932891
#### Censoring pedigree information
28942892

0 commit comments

Comments
 (0)