Skip to content

Commit 6767bc0

Browse files
committed
Merge branch 'devel'
2 parents 9b41e7b + ef2cd8c commit 6767bc0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1184
-691
lines changed

.travis.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ install:
1414
- conda config --set always_yes yes --set changeps1 no
1515
- conda update -y conda python
1616
- conda info -a
17-
- conda install conda-build anaconda-client git ruamel_yaml pip
18-
- pip install git+https://github.com/conda/constructor@6cf619d47
17+
- conda install conda-build anaconda-client git pip constructor
18+
# - pip install git+https://github.com/conda/constructor@6cf619d47
1919
- conda build -c terhorst -c bioconda conda
2020
- conda build -c terhorst -c bioconda conda --output | tee .packages
2121
- conda/template.py conda/construct.tpl $VERSION > conda/construct.yaml

README.rst

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ Quick start guide
2020
should run this once for each independent contig in your dataset,
2121
producing one SMC++ output file per contig.
2222

23-
3. Fit the model using estimate_::
23+
3. Fit the model using cv_::
2424

25-
$ smc++ estimate -o analysis/ 1.25e-8 out/example.chr*.smc.gz
25+
$ smc++ cv -o analysis/ 1.25e-8 out/example.chr*.smc.gz
2626

2727
  The first mandatory argument, ``1.25e-8``, is the per-generation
2828
mutation rate. The remaining arguments are the data files generated
@@ -250,6 +250,9 @@ is::
250250

251251
$ smc++ estimate <mutation rate> <data file> [<data file> ...]
252252

253+
*Please note that, in contrast to earlier versions, the recommended way to
254+
estimate size history is now via the cv_ command*.
255+
253256
Required arguments
254257
^^^^^^^^^^^^^^^^^^
255258

@@ -275,6 +278,20 @@ Optional arguments
275278
A number of other arguments concerning technical aspects of the fitting
276279
procedure exist. To see them, pass the ``-h`` option to ``estimate``.
277280

281+
cv
282+
--
283+
284+
This command is similar to estimate_, with the difference that it uses
285+
cross-validation to obtain sensible model parameters for use during estimation.
286+
*As of version 1.15, this is the recommended way to run SMC++*. The syntax and
287+
options for this command are nearly identical to estimate_:
288+
289+
$ smc++ cv <mutation rate> <data file> [<data file> ...]
290+
291+
The optional `--folds` parameter can be used to specify the number of folds
292+
used for performing `k`-fold cross validation. The default is `2` and should be
293+
set higher in cases where you have more data.
294+
278295
plot
279296
----
280297

@@ -400,13 +417,12 @@ good estimates*.
400417
may need to experiment a bit.
401418

402419
- ``--timepoints``: This command specifies the starting and ending time points
403-
of the model. It accepts either a comma-separated list of two numbers
404-
`t1,tK` specifying the starting and ending time points of the model (in
405-
generations), or the default setting `h`. If `h` is specified, SMC++ will use
406-
an experimental heuristic to calculate the model time points points
420+
of the model. It accepts two numbers `t1 tK` specifying the starting and
421+
ending time points of the model (in generations). If not specified, SMC++
422+
will use an heuristic to calculate the model time points points
407423
automatically.
408424

409-
- ``--regularization-penalty``: This parameter penalizes curvature in
425+
- ``--regularization-penalty``, ``-rp``: This parameter penalizes curvature in
410426
the estimated size history. The default value of this parameter is
411427
``6.0``. Lower values of the penalty shrink the estimated
412428
size history towards a line. If your estimates exhibit too much
@@ -423,7 +439,7 @@ good estimates*.
423439

424440
- ``--knots``: This parameter specifies the number of spline knots
425441
used in the underlying representation of the size history. The default
426-
value is ``32``. Using fewer knots can lead to smoother fits, however
442+
value is ``8``. Using fewer knots can lead to smoother fits, however
427443
underspecifying this parameter may smooth out interesting features of
428444
the size history.
429445

bench/bench.py

Lines changed: 0 additions & 90 deletions
This file was deleted.

bench/large.txt.gz

-219 KB
Binary file not shown.

bench/small.txt.gz

-9.82 KB
Binary file not shown.

bench/tiny.txt.gz

-52 Bytes
Binary file not shown.

conda/meta.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ build:
77

88
source:
99
git_url: ../
10+
git_tag: devel
1011

1112
requirements:
1213
build:
@@ -20,7 +21,7 @@ requirements:
2021
- numpy >=1.10
2122
- gmp >=6.1.0
2223
- gsl >=2.2.1
23-
- mpfr >=3.1.5
24+
- mpfr >=4.0.0
2425
- setuptools_scm >=1.15
2526
run:
2627
- python
@@ -45,6 +46,7 @@ requirements:
4546
- ad >=1.3.2
4647
- readline >=6.2
4748
- attrs >=17.2.0
49+
- msprime >=0.5.0
4850

4951
about:
5052
home: https://github.com/popgenmethods/smcpp

conda/run_test.sh

Lines changed: 42 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,47 @@
11
#!/bin/bash -x
22
SMC=$(which smc++)
3+
TMP=$(mktemp -d)
34
set -e
4-
$SMC vcf2smc -v example/example.vcf.gz /tmp/example.1.smc.gz 1 msp1:msp_0
5-
$SMC vcf2smc -d msp_0 msp_0 example/example.vcf.gz /tmp/example.2.smc.gz 1 msp2:msp_0,msp_3,msp_4
6-
$SMC vcf2smc -d msp_1 msp_1 example/example.vcf.gz /tmp/example.12.smc.gz 1 msp1:msp_1,msp_2 msp2:msp_3,msp_4,msp_0
7-
$SMC estimate -o /tmp/out/1 --unfold --knots 5 --em-iterations 1 1.25e-8 /tmp/example.1.smc.gz
8-
$SMC estimate -o /tmp/out/1 --unfold --knots 5 --timepoints 33,1000 --em-iterations 1 1.25e-8 /tmp/example.1.smc.gz
9-
$SMC estimate -p 0.01 -r 1e-8 -o /tmp/out/2 --knots 5 --em-iterations 1 1.25e-8 /tmp/example.2.smc.gz
10-
$SMC split -o /tmp/out/split --em-iterations 1 \
11-
/tmp/out/1/model.final.json \
12-
/tmp/out/2/model.final.json \
13-
/tmp/example.*.smc.gz
14-
$SMC split --polarization-error .02 -o /tmp/out/split --em-iterations 1 \
15-
/tmp/out/1/model.final.json \
16-
/tmp/out/2/model.final.json \
17-
/tmp/example.*.smc.gz
18-
$SMC posterior /tmp/out/1/model.final.json \
19-
/tmp/matrix.npz /tmp/example.1.smc.gz /tmp/example.1.smc.gz
5+
$SMC vcf2smc -v example/example.vcf.gz $TMP/example.1.smc.gz 1 msp1:msp_0
6+
$SMC vcf2smc -v example/example.vcf.gz $TMP/example.11.smc.gz 1 msp1:msp_1
7+
$SMC vcf2smc -d msp_0 msp_0 example/example.vcf.gz $TMP/example.2.smc.gz 1 msp2:msp_0,msp_3,msp_4
8+
$SMC vcf2smc -d msp_1 msp_1 example/example.vcf.gz $TMP/example.12.smc.gz 1 msp1:msp_1,msp_2 msp2:msp_3,msp_4,msp_0
9+
$SMC chunk 10 200000 $TMP/chunk.1. $TMP/example.1.smc.gz
10+
$SMC chunk 10 200000 $TMP/chunk.2. $TMP/example.2.smc.gz
11+
$SMC chunk 10 200000 $TMP/chunk.12. $TMP/example.12.smc.gz
12+
$SMC estimate --em-iterations 1 -o $TMP/out/1 --unfold --knots 5 --em-iterations 1 1.25e-8 $TMP/example.1.smc.gz
13+
$SMC estimate --em-iterations 1 -o $TMP/out/1 --unfold --knots 5 --timepoints 33 1000 --em-iterations 1 1.25e-8 $TMP/example.1.smc.gz
14+
$SMC estimate --em-iterations 1 -p 0.01 -r 1e-8 -o $TMP/out/2 --knots 5 --em-iterations 1 1.25e-8 $TMP/example.2.smc.gz
15+
$SMC cv --em-iterations 1 --folds 2 -o $TMP/out/cv --fold 0 1e-8 $TMP/example.1.smc.gz $TMP/example.11.smc.gz
16+
$SMC cv --em-iterations 1 --folds 2 -o $TMP/out/cv --fold 1 1e-8 $TMP/example.1.smc.gz $TMP/example.11.smc.gz
17+
$SMC cv --em-iterations 1 --folds 2 -o $TMP/out/cv 1e-8 $TMP/example.1.smc.gz $TMP/example.11.smc.gz
18+
$SMC cv --em-iterations 1 --folds 2 -o $TMP/out/cv1 1e-8 $TMP/example.1.smc.gz $TMP/example.11.smc.gz
19+
$SMC split -o $TMP/out/split --em-iterations 1 \
20+
$TMP/out/1/model.final.json \
21+
$TMP/out/2/model.final.json \
22+
$TMP/example.*.smc.gz
23+
$SMC split -o $TMP/out/split --em-iterations 1 \
24+
$TMP/out/1/model.final.json \
25+
$TMP/out/2/model.final.json \
26+
$TMP/chunk*
27+
$SMC split --polarization-error .02 -o $TMP/out/split --em-iterations 1 \
28+
$TMP/out/1/model.final.json \
29+
$TMP/out/2/model.final.json \
30+
$TMP/example.*.smc.gz
31+
$SMC posterior $TMP/out/1/model.final.json \
32+
$TMP/matrix.npz $TMP/example.1.smc.gz $TMP/example.1.smc.gz
33+
$SMC simulate $TMP/out/1/model.final.json 2 .01 $TMP/out/1/sim.vcf
34+
$SMC simulate $TMP/out/split/model.final.json 2 .01 $TMP/out/split/sim.vcf
2035
$SMC posterior --colorbar -v \
21-
--heatmap /tmp/plot.png \
22-
/tmp/out/1/model.final.json \
23-
/tmp/matrix.npz /tmp/example.1.smc.gz
36+
--heatmap $TMP/plot.png \
37+
$TMP/out/1/model.final.json \
38+
$TMP/matrix.npz $TMP/example.1.smc.gz
2439
$SMC posterior -v \
25-
/tmp/out/split/model.final.json \
26-
/tmp/matrix.npz \
27-
/tmp/example.12.smc.gz
28-
$SMC plot -c -g 29 /tmp/1.png /tmp/out/1/model.final.json
29-
$SMC plot /tmp/2.pdf /tmp/out/2/model.final.json
30-
$SMC plot -c /tmp/12.png /tmp/out/split/model.final.json
31-
$SMC plot -c /tmp/all.pdf /tmp/out/*/model.final.json
40+
$TMP/out/split/model.final.json \
41+
$TMP/matrix.npz \
42+
$TMP/example.12.smc.gz
43+
$SMC plot -c -g 29 $TMP/1.png $TMP/out/1/model.final.json
44+
$SMC plot $TMP/2.pdf $TMP/out/2/model.final.json
45+
$SMC plot -c $TMP/12.png $TMP/out/split/model.final.json
46+
$SMC plot -c $TMP/all.pdf $TMP/out/*/model.final.json
47+

include/bin_key.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ std::set<block_key> bin_key<1>::run(
5353
else
5454
{
5555
ret.emplace(key);
56+
/*
5657
if (true) // or undistinguish)
5758
for (int aa = 0; aa <= na(0); ++aa)
5859
{
@@ -63,7 +64,6 @@ std::set<block_key> bin_key<1>::run(
6364
(bb <= key(2)))
6465
ret.emplace(tmp);
6566
}
66-
/*
6767
if (nb > 0 and ((double)b / (double)nb > cutoff))
6868
for (int bb = (int)(cutoff * nb); bb <= nb; ++bb)
6969
{

setup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@
6262
"matplotlib>=1.5",
6363
"pysam>=0.9",
6464
"attrs>=17.2.0",
65-
"pandas"],
65+
"pandas",
66+
"msprime>=0.5.0"],
6667
entry_points = {
6768
'console_scripts': ['smc++ = smcpp.frontend.console:main'],
6869
}

0 commit comments

Comments
 (0)