You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/stochastic_volatility.py
+20-18Lines changed: 20 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -9,11 +9,11 @@
9
9
frompymc.distributions.timeseriesimport*
10
10
11
11
fromscipy.sparseimportcsc_matrix
12
-
fromscipyimportoptimize
12
+
fromscipyimportoptimize
13
13
14
14
# <markdowncell>
15
15
16
-
# Asset prices have time-varying volatility (variance of day over day `returns`). In some periods, returns are highly vaiable, and in others very stable. Stochastic volatility models model this with a latent volatility variable, modeled as a stochastic process. The following model is similar to the one described in the No-U-Turn Sampler paper, Hoffman (2011) p21.
16
+
# Asset prices have time-varying volatility (variance of day over day `returns`). In some periods, returns are highly variable, while in others very stable. Stochastic volatility models model this with a latent volatility variable, modeled as a stochastic process. The following model is similar to the one described in the No-U-Turn Sampler paper, Hoffman (2011) p21.
17
17
#
18
18
# $$ \sigma \sim Exponential(50) $$
19
19
#
@@ -31,28 +31,27 @@
31
31
32
32
# <codecell>
33
33
34
-
model=Model()
34
+
# Load 400 returns from the S&P 500.
35
+
n=400
36
+
returns=np.genfromtxt("data/SP500.csv")[-n:]
35
37
36
38
# <markdowncell>
37
39
38
40
# Specifying the model in pymc mirrors its statistical specification.
39
41
#
40
-
# However, it is easier to sample the scale of the volatility process innovations, $\sigma$, on a log scale, so we create it using `TransformedVar` and use `logtransform`. `TransformedVar` creates one variable in the transformed space and one in the normal space. The one in the transformed space (here $log(\sigma) $) is the one over which sampling will occur, and the one in the normal space is the one to use throughout the rest of the model.
42
+
# However, it is easier to sample the scale of the volatility process innovations, $\sigma$, on a log scale, so we create it using `TransformedVar` and use `logtransform`. `TransformedVar` creates one variable in the transformed space and one in the normal space. The one in the transformed space (here $\text{log}(\sigma) $) is the one over which sampling will occur, and the one in the normal space is the one to use throughout the rest of the model.
41
43
#
42
44
# It takes a variable name, a distribution and a transformation to use.
# To get a decent scaling matrix for the hamiltonaian sampler, we find the hessian at a point. The method `Model.d2logpc` gives us a Theano compiled function that returns the matrix of 2nd derivatives.
63
+
# To get a decent scaling matrix for the Hamiltonian sampler, we find the Hessian at a point. The method `Model.d2logpc` gives us a `Theano` compiled function that returns the matrix of 2nd derivatives.
65
64
#
66
-
# However, the 2nd derivatives for the degrees of freedom parameter, `nu`, are negative and thus not very informative and make the matrix non-positive definite, so we replace that entry with a reasonable guess at the scale. The interactions between `log_sigma`/`nu` and `s` are also not very useful, so we set them to zero.
65
+
# However, the 2nd derivatives for the degrees of freedom parameter, `nu`, are negative and thus not very informative and make the matrix non-positive definite, so we replace that entry with a reasonable guess at the scale. The interactions between `log_sigma`/`nu` and `s` are also not very useful, so we set them to zero.
67
66
#
68
-
# The hessian matrix is also very sparse, so we make it a sparse matrix for faster sampling.
67
+
# The Hessian matrix is also very sparse, so we make it a sparse matrix for faster sampling.
# We do a short initial run to get near the right area, then start again using a new hessian at the new starting point to get faster sampling due to better scaling.
93
+
# We do a short initial run to get near the right area, then start again using a new Hessian at the new starting point to get faster sampling due to better scaling.
# Now we want to do inference assuming the following model:
44
+
#
45
+
# $$ x \sim \textrm{Normal}(0,1) $$
46
+
# $$ y \sim \textrm{Normal}(\textrm{exp}(x),2)$$
47
+
# $$ z \sim \textrm{Normal}(x + y,0.75)$$
48
+
#
49
+
# The aim here is to get posteriors over $x$ and $y$ given the data we have about $z$ (`zdata`).
50
+
#
31
51
# We create a new `Model` objects, and do operations within its context. The `with` lets PyMC know this model is the current model of interest.
32
52
#
33
53
# We construct new random variables with the constructor for its prior distribution such as `Normal` while within a model context (inside the `with`). When you make a random variable it is automatically added to the model. The constructor returns a Theano variable.
y=Normal('y', mu=exp(x), tau=2.**-2, shape= (ndims,1)) # here, shape is telling us it's a vector rather than a scalar.
62
+
z=Normal('z', mu=x+y, tau=.75**-2, observed=zdata) # shape is inferred from zdata
63
+
64
+
# <markdowncell>
65
+
66
+
# A parenthetical note on the parameters for the normal. Variance is encoded as `tau`, indicating precision, which is simply inverse variance (so $\tau=\sigma^{-2}$ ). This is used because the gamma function is the conjugate prior for precision, and must be inverted to get variance. Encoding in terms of precision saves the inversion step in cases where variance is actually modeled using gamma as a prior.
44
67
45
68
# <markdowncell>
46
69
47
70
# Fit Model
48
71
# ---------
49
-
# We need a starting point for our sampling. The `find_MAP` function finds the maximum a posteriori point (MAP), which is often a good choice for starting point. `find_MAP` uses an optimization algorithm to find the local maximum of the log posterior.
72
+
# We need a starting point for our sampling. The `find_MAP` function finds the maximum a posteriori point (MAP), which is often a good choice for starting point. `find_MAP` uses an optimization algorithm (`scipy.optimize.fmin_l_bfgs_b`, or [BFGS](http://en.wikipedia.org/wiki/BFGS_method), by default) to find the local maximum of the log posterior.
73
+
#
74
+
# Note that this `with` construction is used again. Functions like `find_MAP` and `HamiltonianMC` need to have a model in their context. `with` activates the context of a particular model within its block.
50
75
51
76
# <codecell>
52
77
@@ -59,7 +84,13 @@
59
84
60
85
# <codecell>
61
86
62
-
start
87
+
print"MAP found:"
88
+
print"x:", start['x']
89
+
print"y:", start['y']
90
+
91
+
print"Compare with true values:"
92
+
print"ytrue", ytrue
93
+
print"xtrue", xtrue
63
94
64
95
# <markdowncell>
65
96
@@ -109,7 +140,7 @@
109
140
110
141
# <codecell>
111
142
112
-
traceplot(trace)
143
+
traceplot(trace);
113
144
114
145
# <markdowncell>
115
146
@@ -130,3 +161,10 @@
130
161
# * Without a name argument, it simply constructs a distribution object and returns it. It won't construct a random variable. This object has properties like `logp` (density function) and `expectation`.
131
162
# * With a name argument, it constructs a random variable using the distrubtion object as the prior distribution and inserts this random variable into the current model. Then the constructor returns the random variable.
0 commit comments