Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 41 additions & 18 deletions docs/ancestry.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,8 @@ this.
{class}`.StandardCoalescent`
: Coalescent with recombination ("hudson")

{class}`.SmcApproxCoalescent`
: Sequentially Markov Coalescent ("smc")

{class}`.SmcPrimeApproxCoalescent`
: SMC'("smc_prime")
{class}`.SmcKApproxCoalescent`
: General Sequentially Markov Coalescent

{class}`.DiscreteTimeWrightFisher`
: Generation-by-generation Wright-Fisher
Expand Down Expand Up @@ -1975,15 +1972,16 @@ ancestry model. By default, we run simulations under the
{class}`.StandardCoalescent` model. If we wish to run
under a different model, we use the ``model`` argument to
{func}`.sim_ancestry`. For example, here we use the
{class}`SMC<.SmcApproxCoalescent>` model instead of the
{class}`dtwf<.DiscreteTimeWrightFisher>` model instead of the
standard coalescent:

```{code-cell}
ts1 = msprime.sim_ancestry(
10,
sequence_length=10,
population_size=100,
recombination_rate=0.1,
model=msprime.SmcApproxCoalescent(),
model=msprime.DiscreteTimeWrightFisher(),
random_seed=1234)
```

Expand All @@ -1996,7 +1994,8 @@ ts2 = msprime.sim_ancestry(
10,
sequence_length=10,
recombination_rate=0.1,
model="smc",
population_size=100,
model="dtwf",
random_seed=1234)
assert ts1.equals(ts2, ignore_provenance=True)
```
Expand Down Expand Up @@ -2231,21 +2230,45 @@ in units of 4N generations.

### SMC approximations

The {class}`SMC <.SmcApproxCoalescent>` and {class}`SMC' <.SmcPrimeApproxCoalescent>`
The **SMC** and **SMC′**
are approximations of the continuous time
{ref}`Hudson coalescent<sec_ancestry_models_hudson>` model. These were originally
motivated largely by the need to simulate coalescent processes more efficiently
than was possible using the software available at the time; however,
[improved algorithms](https://doi.org/10.1371/journal.pcbi.1004842)
mean that such approximations are now mostly unnecessary for simulations.

The SMC and SMC' are however very important for inference, as the approximations
have made many analytical advances possible.

Since the SMC approximations are not required for simulation efficiency, these
models are implemented using a naive rejection sampling approach in msprime.
The implementation is intended to facilitate the study of the
SMC approximations, rather than to be used in a general-purpose way.
mean that such approximations are now unnecessary for many simulations.

The **SMC** and **SMC'** are, however, very important for inference, as the approximations
have made many analytical advances possible. Moreover, using these approximations,
we are able to simulate regimes which we couldn't simulate otherwise: for example,
**Drosophila** and **Drosophila-like** simulations with very high scaled recombination rates.


The {class}`SMC(k) <.SmcKApproxCoalescent>` model is a general simulations model that can simulate various **SMC** approximations
(e.g., **SMC** and **SMC′**). It accepts a ```hull_offset``` parameter, which defines the extent of
**SMC** approximations in the simulation. The ```hull_offset``` represents the maximum allowed
distance between two genomic segments that can share a common ancestor. Setting the
```hull_offset``` to **0** means only overlapping genomic segments can share a common ancestor,
corresponding to the backward-in-time definition of the **SMC** model. Similarly, setting
the ```hull_offset``` to **1** allows adjacent genomic segments, as well as overlapping ones, to
share a common ancestor, which defines the **SMC′** model. Simulating under the Hudson
coalescent model is equivalent to setting the ```hull_offset``` to the sequence length. The
hull_offset can take any value between **0** and the sequence length.

In this example, we use the {class}`SMC(k) <.SmcKApproxCoalescent>` model to run **SMC'**
simulations:
```{code-cell}
ts = msprime.sim_ancestry(4, population_size=10,
model=msprime.SmcKApproxCoalescent(hull_offset=1),
random_seed=1)
SVG(ts.draw_svg(y_axis=True, time_scale="log_time"))
```
:::{Note}
Since the **SMC** models are approximations of the {ref}`Hudson coalescent<sec_ancestry_models_hudson>`,
and since the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` model is well optimised for
regimes with moderate scaled recombination rates (including full human chromosome simulations),
we recommend using the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` whenever possible.
:::

(sec_ancestry_models_dtwf)=

Expand Down
4 changes: 4 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,10 @@ for discussion and examples of individual features.
.. autoclass:: msprime.StandardCoalescent
```

```{eval-rst}
.. autoclass:: msprime.SmcKApproxCoalescent
```

```{eval-rst}
.. autoclass:: msprime.SmcApproxCoalescent
```
Expand Down
28 changes: 27 additions & 1 deletion msprime/ancestry.py
Original file line number Diff line number Diff line change
Expand Up @@ -1785,6 +1785,9 @@ class StandardCoalescent(AncestryModel):

class SmcApproxCoalescent(AncestryModel):
"""
Legacy implementation of the SMC model. Please use :class:`SmcKApproxCoalescent`
instead.

The Sequentially Markov Coalescent (SMC) model defined by
`McVean and Cardin (2005) <https://dx.doi.org/10.1098%2Frstb.2005.1673>`_.
In the SMC, only common ancestor events that result in marginal coalescences
Expand All @@ -1795,6 +1798,7 @@ class SmcApproxCoalescent(AncestryModel):
This model is implemented using a naive rejection sampling approach
and so it may not be any more efficient to simulate than the
standard Hudson model.
We recommend using the ``SmcKApproxCoalescent(hull_offset=0)`` instead

The string ``"smc"`` can be used to refer to this model.
"""
Expand All @@ -1804,6 +1808,9 @@ class SmcApproxCoalescent(AncestryModel):

class SmcPrimeApproxCoalescent(AncestryModel):
"""
Legacy implementation of the SMC' model. Please use :class:`SmcKApproxCoalescent`
instead.

The SMC' model defined by
`Marjoram and Wall (2006) <https://doi.org/10.1186/1471-2156-7-16>`_
as a refinement of the :class:`SMC<SmcApproxCoalescent>`. The SMC'
Expand All @@ -1814,7 +1821,8 @@ class SmcPrimeApproxCoalescent(AncestryModel):
.. note::
This model is implemented using a naive rejection sampling approach
and so it may not be any more efficient to simulate than the
standard Hudson model.
standard Hudson model. We recommend using the
``SmcKApproxCoalescent(hull_offset=1)`` instead.

The string ``"smc_prime"`` can be used to refer to this model.
"""
Expand All @@ -1830,6 +1838,24 @@ class ParametricAncestryModel(AncestryModel):

@dataclasses.dataclass
class SmcKApproxCoalescent(ParametricAncestryModel):
"""
A general Sequentially Markov Coalescent (SMC) model. This model accepts a
``hull_offset`` parameter (defaults to 0) that defines the allowed distances
between the genomic tracts of ancestral material in a common ancestor event.

Specifically, if the hull_offset is set to 0, then only overlapping genomic
tracts can be joined by a common ancestor event (this is equivalent to the
SMC model). If the hull_offset is set to 1, then overlapping or adjacent
genomic tracts can be joined by a common ancestor (this is equivalent to the
SMC' model). If the hull_offset is set to full the sequence length, then any
segments can share a common ancestor, which is equivalent to the standard Hudson
coalescent.

:param float hull_offset: Determines the maximum distance between genomic tracts
of ancestral material that can be joined by a common ancestor event.
Defaults to 0 (equivalent to the SMC model).
"""

name = "smc_k"

hull_offset: float
Expand Down