Skip to content

Commit 1060bda

Browse files
authored
Merge pull request #224 from petrelharp/test_annotations
Lots more updates!
2 parents 47552fa + be63103 commit 1060bda

35 files changed

+1091
-226
lines changed

CHANGELOG.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22
[UPCOMING.X.X] - XXXX-XX-XX
33
***************************
44

5+
**Breaking changes**:
6+
7+
-
8+
9+
**New features**:
10+
11+
- Added `pyslim.population_size( )` to compute an array giving numbers of
12+
individuals across a grid of space and time bins. ({user}giliapatterson)
513

614
********************
715
[0.600] - 2021-02-24

docs/_static/pedigree01.png

0 Bytes
Loading

docs/metadata.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ to SLiM time as follows:
155155

156156
```{code-cell}
157157
def slim_time(ts, time, stage):
158-
slim_time = ts.slim_generation - time
158+
slim_time = ts.metadata["SLiM"]["generation"] - time
159159
if ts.metadata['SLiM']['model_type'] == "WF":
160160
if (ts.metadata['SLiM']['stage'] == "early"
161161
and stage == "late"):
@@ -211,21 +211,24 @@ could be used to set spatial bounds on an annotated msprime simulation, for inst
211211
To modify the metadata that ``pyslim`` has introduced into
212212
the tree sequence produced by a coalescent simulation,
213213
or the metadata in a SLiM-produced tree sequence,
214-
what we do is (a) extract the metadata (as a list of dicts),
215-
(b) modify them, and then (c) write them back into the tables.
214+
we need to edit the TableCollection that forms the editable data behind the tree sequence.
216215
For instance, to set the ages of the individuals in the tree sequence to random numbers between 1 and 4,
217-
and write out the resulting tree sequence:
216+
we will extract a copy of the underlying tables, clear it,
217+
and then iterate over the individuals in the tree sequence,
218+
as we go re-inserting them into the tables
219+
after replacing their metadata with a modified version:
218220

219221
```{code-cell}
220-
tables = ts.tables
221-
ind_md = [ind.metadata for ind in tables.individuals]
222-
for md in ind_md:
223-
md["age"] = random.choice([1,2,3,4])
222+
tables = ts.dump_tables()
223+
tables.individuals.clear()
224+
for ind in ts.individuals():
225+
md = ind.metadata
226+
md["age"] = random.choice([1,2,3,4])
227+
_ = tables.individuals.append(
228+
ind.replace(metadata=md)
229+
)
224230
225-
ims = tables.individuals.metadata_schema
226-
tables.individuals.packset_metadata(
227-
[ims.validate_and_encode_row(md) for md in ind_md])
228-
mod_ts = pyslim.load_tables(tables)
231+
mod_ts = tables.tree_sequence()
229232
230233
# check that it worked:
231234
print("First ten ages:", [mod_ts.individual(i).metadata["age"] for i in range(10)])
@@ -342,7 +345,7 @@ These methods would set the metadata column of a table -
342345
for instance, if ``metadata`` is a list of NodeMetadata objects, then
343346
``annotate_node_metadata(tables, metadata)`` would modify ``tables.nodes`` in place
344347
to contain the (encoded) metadata in the list ``metadata``.
345-
Now, this would be done as follows (where now ``metadata`` is a list of metadata dicts):
348+
Now, this could be done as follows (where now ``metadata`` is a list of metadata dicts):
346349

347350
```{code-cell}
348351
metadata = [ {'slim_id': k, 'is_null': False, 'genome_type': 0}

docs/phylo_bgs.slim

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,23 @@ initialize()
1414
initializeRecombinationRate(1e-9);
1515
}
1616

17-
1 early() {
17+
1 late() {
1818
// if no input tree sequence is provided, then start a subpopulation
1919
if (infile == "") {
20-
sim.addSubpop("p1", popsize);
20+
p = sim.addSubpop("p1", popsize);
2121
} else {
22+
// relaoding must happen in late()
2223
sim.readFromPopulationFile(infile);
23-
p1.setSubpopulationSize(popsize);
24+
parent = sim.subpopulations[0];
25+
p = sim.addSubpopSplit(max(sim.subpopulations.id) + 1, popsize, parent);
26+
parent.setSubpopulationSize(0);
2427
}
25-
finalgen = num_gens + sim.generation - 1;
26-
// scheduling the end of the simulation
28+
p.name = popname;
29+
}
30+
31+
// schedule the end of the simulation
32+
1 late() {
33+
finalgen = num_gens + sim.generation;
2734
sim.rescheduleScriptBlock(s0, generations=finalgen);
2835
}
2936

docs/python_api.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ kernelspec:
3434
This page provides detailed documentation for the methods and classes
3535
available in pyslim.
3636

37-
## Methods
37+
## Editing or adding to tree sequences
3838

39-
pyslim provides tools for transforming tree sequences:
39+
``pyslim`` provides tools for transforming tree sequences:
4040

4141

4242
```{eval-rst}
@@ -45,8 +45,19 @@ pyslim provides tools for transforming tree sequences:
4545
recapitate
4646
convert_alleles
4747
generate_nucleotides
48+
annotate_defaults
49+
update_tables
4850
```
4951

52+
## Summarizing tree sequences
53+
54+
Additionally, ``pyslim`` contains the following summary methods:
55+
56+
```{eval-rst}
57+
.. autosummary::
58+
59+
population_size
60+
```
5061

5162

5263
## Additions to the tree sequence

docs/tutorial.md

Lines changed: 44 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -108,11 +108,13 @@ and so our simulation would have less genetic variation than it should have
108108
Doing this is as simple as:
109109

110110
```{code-cell}
111-
orig_ts = pyslim.load("example_sim.trees")
112-
rts = orig_ts.recapitate(
111+
orig_ts = tskit.load("example_sim.trees")
112+
rts = pyslim.recapitate(orig_ts,
113113
recombination_rate=1e-8,
114-
Ne=200, random_seed=5)
114+
ancestral_Ne=200, random_seed=5)
115115
```
116+
The warning is harmless; it is reminding us to think about generation time
117+
when recapitating a nonWF simulation (a topic we'll deal with later).
116118

117119
We can check that this worked as expected, by verifying that after recapitation
118120
all trees have only one root:
@@ -124,14 +126,14 @@ print(f"Maximum number of roots before recapitation: {orig_max_roots}\n"
124126
f"After recapitation: {recap_max_roots}")
125127
```
126128

127-
The {meth}`.SlimTreeSequence.recapitate` method
129+
The {func}`.recapitate` method
128130
is just a thin wrapper around {func}`msprime.sim_ancestry`,
129131
and you need to set up demography explicitly - for instance, in the example above
130132
we've simulated from an ancestral population of ``Ne=200`` diploids.
131133
If you have more than one population,
132134
you must set migration rates or else coalescence will never happen
133135
(see {ref}`sec_recapitate_with_migration` for an example,
134-
and {meth}`.SlimTreeSequence.recapitate` for more).
136+
and {func}`.recapitate` for more).
135137

136138

137139
#### Recapitation with a nonuniform recombination map
@@ -223,9 +225,9 @@ positions[-1] += 1
223225
assert positions[-1] == orig_ts.sequence_length
224226
225227
recomb_map = msprime.RateMap(position=positions, rate=rates)
226-
rts = orig_ts.recapitate(
227-
recombination_map=recomb_map,
228-
Ne=200, random_seed=7)
228+
rts = pyslim.recapitate(orig_ts,
229+
recombination_rate=recomb_map,
230+
ancestral_Ne=200, random_seed=7)
229231
assert(max([t.num_roots for t in rts.trees()]) == 1)
230232
```
231233
(As before, you should *not* usually explicitly set
@@ -301,7 +303,7 @@ which would be inconsistent with the SLiM simulation.
301303

302304
After recapitation,
303305
simplification to the history of 100 individuals alive today
304-
can be done with the {meth}`.SlimTreeSequence.simplify` method:
306+
can be done with the {meth}`tskit.TreeSequence.simplify` method:
305307

306308
```{code-cell}
307309
import numpy as np
@@ -423,7 +425,9 @@ and write their SNPs to a VCF is:
423425
```{code-cell}
424426
np.random.seed(1)
425427
keep_indivs = np.random.choice(alive, 100, replace=False)
426-
ts = pyslim.SlimTreeSequence(msprime.mutate(orig_ts, rate=1e-8, random_seed=1))
428+
ts = pyslim.SlimTreeSequence(
429+
msprime.sim_mutations(orig_ts, rate=1e-8, random_seed=1)
430+
)
427431
with open("example_snps.vcf", "w") as vcffile:
428432
ts.write_vcf(vcffile, individuals=keep_indivs)
429433
```
@@ -452,7 +456,9 @@ keep_nodes = []
452456
for i in keep_indivs:
453457
keep_nodes.extend(orig_ts.individual(i).nodes)
454458
sts = rts.simplify(keep_nodes)
455-
ts = pyslim.SlimTreeSequence(msprime.mutate(sts, rate=1e-8, random_seed=1))
459+
ts = pyslim.SlimTreeSequence(
460+
msprime.sim_mutations(sts, rate=1e-8, random_seed=1)
461+
)
456462
```
457463
Individuals are retained by simplify if any of their nodes are,
458464
so we would get an alive individual without sample nodes if, for instance,
@@ -533,27 +539,43 @@ so there is an empty "population 0" in a SLiM-produced tree sequence.
533539

534540
(sec_recapitate_with_migration)=
535541

536-
## Recapitation with more than one population
542+
## Recapitation with migration between more than one population
537543

538544
Following on the last example,
539545
let's recapitate and mutate the tree sequence.
540-
Recapitation takes a bit more thought, because we have to specify a migration matrix
541-
(or else it will run forever, unable to coalesce).
546+
Recall that this recipe had two populations, ``p1`` and ``p2``,
547+
each of size 1000.
548+
Recapitation takes a bit more thought, because if the two populations stay separate,
549+
it will run forever, unable to coalesce.
550+
By default, :func:`.recapitate` *merges* the two populations into a single
551+
one of size ``ancestral_Ne``.
552+
But, if we'd like them to stay separate, we need to inclue migration between them.
553+
Here's how we set up the demography using msprime's tools:
542554

543555
```{code-cell}
544-
pop_configs = [msprime.PopulationConfiguration(initial_size=1000)
545-
for _ in range(orig_ts.num_populations)]
546-
rts = orig_ts.recapitate(population_configurations=pop_configs,
547-
migration_matrix=[[0.0, 0.0, 0.0],
548-
[0.0, 0.0, 0.1],
549-
[0.0, 0.1, 0.0]],
556+
demography = msprime.Demography.from_tree_sequence(orig_ts)
557+
for pop in demography.populations:
558+
# must set their effective population sizes
559+
pop.initial_size = 1000
560+
561+
demography.add_migration_rate_change(
562+
time=orig_ts.metadata['SLiM']['generation'],
563+
rate=0.1, source="p1", dest="p2",
564+
)
565+
demography.add_migration_rate_change(
566+
time=orig_ts.metadata['SLiM']['generation'],
567+
rate=0.1, source="p2", dest="p1",
568+
)
569+
rts = pyslim.recapitate(orig_ts, demography=demography,
550570
recombination_rate=1e-8,
551-
random_seed=4)
571+
random_seed=4
572+
)
552573
ts = pyslim.SlimTreeSequence(
553574
msprime.sim_mutations(
554575
rts, rate=1e-8,
555576
model=msprime.SLiMMutationModel(type=0),
556-
random_seed=7))
577+
random_seed=7)
578+
)
557579
```
558580

559581
Again, there are *three* populations because SLiM starts counting at 1;

docs/vignette_coalescent_diversity.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -201,13 +201,9 @@ for m in ots.mutations():
201201
if sid not in mut_map:
202202
mut_map[sid] = np.random.exponential(scale=0.04)
203203
md["selection_coeff"] = mut_map[sid]
204-
_ = tables.mutations.add_row(
205-
site=m.site,
206-
node=m.node,
207-
time=m.time,
208-
derived_state=m.derived_state,
209-
parent=m.parent,
210-
metadata={"mutation_list": md_list})
204+
_ = tables.mutations.append(
205+
m.replace(metadata={"mutation_list": md_list})
206+
)
211207
212208
# check we didn't mess anything up
213209
assert tables.mutations.num_rows == ots.num_mutations

0 commit comments

Comments
 (0)