Skip to content

Commit 1526582

Browse files
hossam26644jeromekelleher
authored andcommitted
including the smck class def in docs
change smc examples in ancestry.md update SMC text in ancestry.md Drosophila in bold
1 parent e639d01 commit 1526582

File tree

3 files changed

+72
-19
lines changed

3 files changed

+72
-19
lines changed

docs/ancestry.md

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -93,11 +93,8 @@ this.
9393
{class}`.StandardCoalescent`
9494
: Coalescent with recombination ("hudson")
9595

96-
{class}`.SmcApproxCoalescent`
97-
: Sequentially Markov Coalescent ("smc")
98-
99-
{class}`.SmcPrimeApproxCoalescent`
100-
: SMC'("smc_prime")
96+
{class}`.SmcKApproxCoalescent`
97+
: General Sequentially Markov Coalescent
10198

10299
{class}`.DiscreteTimeWrightFisher`
103100
: Generation-by-generation Wright-Fisher
@@ -1975,15 +1972,16 @@ ancestry model. By default, we run simulations under the
19751972
{class}`.StandardCoalescent` model. If we wish to run
19761973
under a different model, we use the ``model`` argument to
19771974
{func}`.sim_ancestry`. For example, here we use the
1978-
{class}`SMC<.SmcApproxCoalescent>` model instead of the
1975+
{class}`dtwf<.DiscreteTimeWrightFisher>` model instead of the
19791976
standard coalescent:
19801977

19811978
```{code-cell}
19821979
ts1 = msprime.sim_ancestry(
19831980
10,
19841981
sequence_length=10,
1982+
population_size=100,
19851983
recombination_rate=0.1,
1986-
model=msprime.SmcApproxCoalescent(),
1984+
model=msprime.DiscreteTimeWrightFisher(),
19871985
random_seed=1234)
19881986
```
19891987

@@ -1996,7 +1994,8 @@ ts2 = msprime.sim_ancestry(
19961994
10,
19971995
sequence_length=10,
19981996
recombination_rate=0.1,
1999-
model="smc",
1997+
population_size=100,
1998+
model="dtwf",
20001999
random_seed=1234)
20012000
assert ts1.equals(ts2, ignore_provenance=True)
20022001
```
@@ -2231,21 +2230,45 @@ in units of 4N generations.
22312230

22322231
### SMC approximations
22332232

2234-
The {class}`SMC <.SmcApproxCoalescent>` and {class}`SMC' <.SmcPrimeApproxCoalescent>`
2233+
The **SMC** and **SMC**
22352234
are approximations of the continuous time
22362235
{ref}`Hudson coalescent<sec_ancestry_models_hudson>` model. These were originally
22372236
motivated largely by the need to simulate coalescent processes more efficiently
22382237
than was possible using the software available at the time; however,
22392238
[improved algorithms](https://doi.org/10.1371/journal.pcbi.1004842)
2240-
mean that such approximations are now mostly unnecessary for simulations.
2241-
2242-
The SMC and SMC' are however very important for inference, as the approximations
2243-
have made many analytical advances possible.
2244-
2245-
Since the SMC approximations are not required for simulation efficiency, these
2246-
models are implemented using a naive rejection sampling approach in msprime.
2247-
The implementation is intended to facilitate the study of the
2248-
SMC approximations, rather than to be used in a general-purpose way.
2239+
mean that such approximations are now unnecessary for many simulations.
2240+
2241+
The **SMC** and **SMC'** are, however, very important for inference, as the approximations
2242+
have made many analytical advances possible. Moreover, using these approximations,
2243+
we are able to simulate regimes which we couldn't simulate otherwise: for example,
2244+
**Drosophila** and **Drosophila-like** simulations with very high scaled recombination rates.
2245+
2246+
2247+
The {class}`SMC(k) <.SmcKApproxCoalescent>` model is a general simulations model that can simulate various **SMC** approximations
2248+
(e.g., **SMC** and **SMC′**). It accepts a ```hull_offset``` parameter, which defines the extent of
2249+
**SMC** approximations in the simulation. The ```hull_offset``` represents the maximum allowed
2250+
distance between two genomic segments that can share a common ancestor. Setting the
2251+
```hull_offset``` to **0** means only overlapping genomic segments can share a common ancestor,
2252+
corresponding to the backward-in-time definition of the **SMC** model. Similarly, setting
2253+
the ```hull_offset``` to **1** allows adjacent genomic segments, as well as overlapping ones, to
2254+
share a common ancestor, which defines the **SMC′** model. Simulating under the Hudson
2255+
coalescent model is equivalent to setting the ```hull_offset``` to the sequence length. The
2256+
hull_offset can take any value between **0** and the sequence length.
2257+
2258+
In this example, we use the {class}`SMC(k) <.SmcKApproxCoalescent>` model to run **SMC'**
2259+
simulations:
2260+
```{code-cell}
2261+
ts = msprime.sim_ancestry(4, population_size=10,
2262+
model=msprime.SmcKApproxCoalescent(hull_offset=1),
2263+
random_seed=1)
2264+
SVG(ts.draw_svg(y_axis=True, time_scale="log_time"))
2265+
```
2266+
:::{Note}
2267+
Since the **SMC** models are approximations of the {ref}`Hudson coalescent<sec_ancestry_models_hudson>`,
2268+
and since the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` model is well optimised for
2269+
regimes with moderate scaled recombination rates (including full human chromosome simulations),
2270+
we recommend using the {ref}`Hudson coalescent<sec_ancestry_models_hudson>` whenever possible.
2271+
:::
22492272

22502273
(sec_ancestry_models_dtwf)=
22512274

docs/api.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,10 @@ for discussion and examples of individual features.
121121
.. autoclass:: msprime.StandardCoalescent
122122
```
123123

124+
```{eval-rst}
125+
.. autoclass:: msprime.SmcKApproxCoalescent
126+
```
127+
124128
```{eval-rst}
125129
.. autoclass:: msprime.SmcApproxCoalescent
126130
```

msprime/ancestry.py

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1785,6 +1785,9 @@ class StandardCoalescent(AncestryModel):
17851785

17861786
class SmcApproxCoalescent(AncestryModel):
17871787
"""
1788+
Legacy implementation of the SMC model. Please use :class:`SmcKApproxCoalescent`
1789+
instead.
1790+
17881791
The Sequentially Markov Coalescent (SMC) model defined by
17891792
`McVean and Cardin (2005) <https://dx.doi.org/10.1098%2Frstb.2005.1673>`_.
17901793
In the SMC, only common ancestor events that result in marginal coalescences
@@ -1795,6 +1798,7 @@ class SmcApproxCoalescent(AncestryModel):
17951798
This model is implemented using a naive rejection sampling approach
17961799
and so it may not be any more efficient to simulate than the
17971800
standard Hudson model.
1801+
We recommend using the ``SmcKApproxCoalescent(hull_offset=0)`` instead
17981802
17991803
The string ``"smc"`` can be used to refer to this model.
18001804
"""
@@ -1804,6 +1808,9 @@ class SmcApproxCoalescent(AncestryModel):
18041808

18051809
class SmcPrimeApproxCoalescent(AncestryModel):
18061810
"""
1811+
Legacy implementation of the SMC' model. Please use :class:`SmcKApproxCoalescent`
1812+
instead.
1813+
18071814
The SMC' model defined by
18081815
`Marjoram and Wall (2006) <https://doi.org/10.1186/1471-2156-7-16>`_
18091816
as a refinement of the :class:`SMC<SmcApproxCoalescent>`. The SMC'
@@ -1814,7 +1821,8 @@ class SmcPrimeApproxCoalescent(AncestryModel):
18141821
.. note::
18151822
This model is implemented using a naive rejection sampling approach
18161823
and so it may not be any more efficient to simulate than the
1817-
standard Hudson model.
1824+
standard Hudson model. We recommend using the
1825+
``SmcKApproxCoalescent(hull_offset=1)`` instead.
18181826
18191827
The string ``"smc_prime"`` can be used to refer to this model.
18201828
"""
@@ -1830,6 +1838,24 @@ class ParametricAncestryModel(AncestryModel):
18301838

18311839
@dataclasses.dataclass
18321840
class SmcKApproxCoalescent(ParametricAncestryModel):
1841+
"""
1842+
A general Sequentially Markov Coalescent (SMC) model. This model accepts a
1843+
``hull_offset`` parameter (defaults to 0) that defines the allowed distances
1844+
between the genomic tracts of ancestral material in a common ancestor event.
1845+
1846+
Specifically, if the hull_offset is set to 0, then only overlapping genomic
1847+
tracts can be joined by a common ancestor event (this is equivalent to the
1848+
SMC model). If the hull_offset is set to 1, then overlapping or adjacent
1849+
genomic tracts can be joined by a common ancestor (this is equivalent to the
1850+
SMC' model). If the hull_offset is set to full the sequence length, then any
1851+
segments can share a common ancestor, which is equivalent to the standard Hudson
1852+
coalescent.
1853+
1854+
:param float hull_offset: Determines the maximum distance between genomic tracts
1855+
of ancestral material that can be joined by a common ancestor event.
1856+
Defaults to 0 (equivalent to the SMC model).
1857+
"""
1858+
18331859
name = "smc_k"
18341860

18351861
hull_offset: float

0 commit comments

Comments
 (0)