@@ -118,8 +118,8 @@ toy example. However, for real data we will not prepare our data and infer the t
118
118
in one go; rather, we will usually split the process into at least two distinct steps.
119
119
120
120
The first step in any inference is to prepare your data and import it into a :ref: `sample data
121
- <sec_file_formats_samples>` file. For simplicity here we'll simulate some data under the
122
- coalescent with recombination using `msprime
121
+ <sec_file_formats_samples>` file. For simplicity here we'll use Python to simulate some
122
+ data under the coalescent with recombination, using `msprime
123
123
<https://msprime.readthedocs.io/en/stable/api.html#msprime.simulate> `_:
124
124
125
125
.. code-block :: python
@@ -180,7 +180,7 @@ import the data for this simulation into ``tsinfer``'s sample data format.
180
180
tsinfer.SampleData.from_tree_sequence(
181
181
ts, path="simulation.samples", num_flush_threads=2, use_times=False)
182
182
183
- Examining the files, we then see the following::
183
+ Examining the files on the command line , we then see the following::
184
184
185
185
$ ls -lh simulation*
186
186
-rw-r--r-- 1 jk jk 22M May 12 11:06 simulation.samples
@@ -224,7 +224,9 @@ actual data) requires about 390MB uncompressed. The ``tsinfer`` sample data form
224
224
achieving a roughly 20X compression in this case. In practise this means we can keep such files
225
225
lying around without taking up too much space.
226
226
227
- Once we have our ``.samples `` file created, running the inference is straightforward::
227
+ Once we have our ``.samples `` file created, running the inference is straightforward.
228
+ We can do so within Python (as we did in the toy example above), or use ``tsinfer `` on
229
+ the command-line, which is useful when inference is expected to take a long time::
228
230
229
231
$ tsinfer infer simulation.samples -p -t 4
230
232
ga-add (1/6): 100%|███████████████████████| 35.2K/35.2K [00:02, 15.3Kit/s]
@@ -252,7 +254,8 @@ Looking at our output files, we see::
252
254
253
255
Therefore our output tree sequence file that we have just inferred in less than five minutes is
254
256
*even smaller * than the original ``msprime `` simulated tree sequence! Because the output file is
255
- also a :class: `tskit.TreeSequence `, we can use the same API to work with both.
257
+ also a :class: `tskit.TreeSequence `, we can use the same API to work with both, for example,
258
+ within Python we can do:
256
259
257
260
.. code-block :: python
258
261
@@ -273,7 +276,7 @@ also a :class:`tskit.TreeSequence`, we can use the same API to work with both.
273
276
print (" Inferred tree: interval=" , tree.interval)
274
277
print (tree.draw(format = " unicode" ))
275
278
276
- Here we first load up our source and inferred tree sequences from their corresponding
279
+ This first loads up our source and inferred tree sequences from their corresponding
277
280
``.trees `` files. Each of the trees in these tree sequences has 10 thousand samples
278
281
which is much too large to easily visualise. Therefore, to make things simple here
279
282
we subset both tree sequences down to their minimal representations for six
0 commit comments