Skip to content

Commit 5197d5c

Browse files
committed
Merge branch 'master' into gp-module
2 parents e8fb09a + 25bd58a commit 5197d5c

30 files changed

+1933
-575
lines changed

docs/source/api.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ API Reference
88
:maxdepth: 2
99

1010
api/distributions
11+
api/bounds
1112
api/inference
1213
api/glm
1314
api/gp
@@ -16,4 +17,4 @@ API Reference
1617
api/diagnostics
1718
api/backends
1819
api/math
19-
api/data
20+
api/data

docs/source/api/bounds.rst

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
=================
2+
Bounded Variables
3+
=================
4+
5+
PyMC3 includes the construct ``Bound`` for placing constraints on existing
6+
probability distributions. It modifies a given distribution to take values
7+
only within a specified interval.
8+
9+
Some types of variables require constraints. For instance, it doesn't make
10+
sense for a standard deviation to have a negative value, so something like a
11+
Normal prior on a parameter that represents a standard deviation would be
12+
inappropriate. PyMC3 includes distributions that have positive support, such
13+
as ``Gamma`` or ``Exponential``. PyMC3 also includes several bounded
14+
distributions, such as ``Uniform``, ``HalfNormal``, and ``HalfCauchy``, that
15+
are restricted to a specific domain.
16+
17+
All univariate distributions in PyMC3 can be given bounds. The distribution of
18+
a continuous variable that has been bounded is automatically transformed into
19+
an unnormalized distribution whose domain is unconstrained. The transformation
20+
improves the efficiency of sampling and variational inference algorithms.
21+
22+
Usage
23+
#####
24+
25+
For example, one may have prior information that suggests that the value of a
26+
parameter representing a standard deviation is near one. One could use a
27+
Normal distribution while constraining the support to be positive. The
28+
specification of a bounded distribution should go within the model block::
29+
30+
import pymc3 as pm
31+
32+
with pm.Model() as model:
33+
BoundedNormal = pm.Bound(pm.Normal, lower=0.0)
34+
x = BoundedNormal('x', mu=1.0, sd=3.0)
35+
36+
If the bound will be applied to a single variable in the model, it may be
37+
cleaner notationally to define both the bound and variable together. ::
38+
39+
with model:
40+
x = pm.Bound(pm.Normal, lower=0.0)('x', mu=1.0, sd=3.0)
41+
42+
Bounds can also be applied to a vector of random variables. With the same
43+
``BoundedNormal`` object we created previously we can write::
44+
45+
with model:
46+
x_vector = BoundedNormal('x_vector', mu=1.0, sd=3.0, shape=3)
47+
48+
Caveats
49+
#######
50+
51+
* Bounds cannot be given to variables that are ``observed``. To model
52+
truncated data, use a ``Potential`` in combination with a cumulative
53+
probability function. See `this example <https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/censored_data.py>`_.
54+
55+
* The automatic transformation applied to continuous distributions results in
56+
an unnormalized probability distribution. This doesn't effect inference
57+
algorithms but may complicate some model comparison procedures.
58+

docs/source/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@
3131
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
3232
# ones.
3333
extensions = [
34+
'matplotlib.sphinxext.only_directives',
35+
'matplotlib.sphinxext.plot_directive',
3436
'sphinx.ext.autodoc',
3537
'sphinx.ext.autosummary',
3638
'sphinx.ext.doctest',

docs/source/images/forestplot.png

13.1 KB
Loading

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Contents:
88
.. toctree::
99
:maxdepth: 3
1010

11+
intro
1112
getting_started
1213
prob_dists
1314
examples

docs/source/intro.rst

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
.. _intro:
2+
3+
************
4+
Introduction
5+
************
6+
7+
8+
Purpose
9+
=======
10+
11+
PyMC3 is a probabilistic programming module for Python that allows users to fit Bayesian models using a variety of numerical methods, most notably Markov chain Monte Carlo (MCMC) and variational inference (VI). Its flexibility and extensibility make it applicable to a large suite of problems. Along with core model specification and fitting functionality, PyMC3 includes functionality for summarizing output and for model diagnostics.
12+
13+
14+
15+
Features
16+
========
17+
18+
PyMC3 strives to make Bayesian modeling as simple and painless as possible, allowing users to focus on their scientific problem, rather than on the methods used to solve it. Here is a partial list of its features:
19+
20+
* Modern methods for fitting Bayesian models, including MCMC and VI.
21+
22+
* Includes a large suite of well-documented statistical distributions.
23+
24+
* Uses Theano as the computational backend, allowing for fast expression evaluation, automatic gradient calculation, and GPU computing.
25+
26+
* Built-in support for Gaussian process modeling.
27+
28+
* Model summarization and plotting.
29+
30+
* Model checking and convergence detection.
31+
32+
* Extensible: easily incorporates custom step methods and unusual probability
33+
distributions.
34+
35+
* Bayesian models can be embedded in larger programs, and results can be analyzed
36+
with the full power of Python.
37+
38+
39+
What's new in version 3
40+
=======================
41+
42+
The third major version of PyMC has benefitted from being re-written from scratch. Substantial improvements in the user interface and performance have resulted from this. While PyMC2 relied on Fortran extensions (via f2py) for most of the computational heavy-lifting, PyMC3 leverages Theano, a library from the Montréal Institute for Learning Algorithms (MILA), for array-based expression evaluation, to perform its computation. What this provides, above all else, is fast automatic differentiation, which is at the heart of the gradient-based sampling and optimization methods currently providing inference for probabilistic programming.
43+
44+
Major changes from previous versions:
45+
46+
* New flexible object model and syntax (not backward-compatible with PyMC2).
47+
48+
* Gradient-based MCMC methods, including Hamiltonian Monte Carlo (HMC), the No U-turn Sampler (NUTS), and Stein Variational Gradient Descent.
49+
50+
* Variational inference methods, including automatic differentiation variational inference (ADVI) and operator variational inference (OPVI).
51+
52+
* An interface for easy formula-based specification of generalized linear models (GLM).
53+
54+
* Elliptical slice sampling.
55+
56+
* Specialized distributions for representing time series.
57+
58+
* A library of Jupyter notebooks that provide case studies and fully developed usage examples.
59+
60+
* Much more!
61+
62+
While the addition of Theano adds a level of complexity to the development of PyMC, fundamentally altering how the underlying computation is performed, we have worked hard to maintain the elegant simplicity of the original PyMC model specification syntax.
63+
64+
65+
History
66+
=======
67+
68+
PyMC began development in 2003, as an effort to generalize the process of
69+
building Metropolis-Hastings samplers, with an aim to making Markov chain Monte
70+
Carlo (MCMC) more accessible to applied scientists.
71+
The choice to develop PyMC as a python module, rather than a standalone
72+
application, allowed the use MCMC methods in a larger modeling framework. By
73+
2005, PyMC was reliable enough for version 1.0 to be released to the public. A
74+
small group of regular users, most associated with the University of Georgia,
75+
provided much of the feedback necessary for the refinement of PyMC to a usable
76+
state.
77+
78+
In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development
79+
team for PyMC 2.0. This iteration of the software strives for more flexibility,
80+
better performance and a better end-user experience than any previous version
81+
of PyMC. PyMC 2.2 was released in April 2012. It contained numerous bugfixes and
82+
optimizations, as well as a few new features, including improved output
83+
plotting, csv table output, improved imputation syntax, and posterior
84+
predictive check plots. PyMC 2.3 was released on October 31, 2013. It included
85+
Python 3 compatibility, improved summary plots, and some important bug fixes.
86+
87+
In 2011, John Salvatier began thinking about implementing gradient-based MCMC samplers, and developed the ``mcex`` package to experiment with his ideas. The following year, John was invited by the team to re-engineer PyMC to accomodate Hamiltonian Monte Carlo sampling. This led to the adoption of Theano as the computational back end, and marked the beginning of PyMC3's development. The first alpha version of PyMC3 was released in June 2015. Over the following 2 years, the core development team grew to 12 members, and the first release, PyMC3 3.0, was launched in January 2017.
88+
89+
90+
Usage Overview
91+
==============
92+
93+
For a detailed overview of building models in PyMC3, please read the appropriate sections in the rest of the documentation. For a flavor of what PyMC3 models look like, here is a quick example.
94+
95+
First, import the PyMC3 functions and classes you will need for building your model. You can import the entire module via ``import pymc3 as pm``, or just bring in what you need::
96+
97+
from pymc3 import Model, Normal, invlogit, Binomial, sample, forestplot
98+
import numpy as np
99+
100+
Models are defined using a context manager (``with`` statement). The model is specified declaratively inside the context manager, instantiating model variables and transforming them as necessary. Here is an example of a model for a bioassay experiment.
101+
102+
::
103+
104+
# Data
105+
n = np.ones(4)*5
106+
y = np.array([0, 1, 3, 5])
107+
dose = np.array([-.86,-.3,-.05,.73])
108+
109+
with Model() as bioassay_model:
110+
111+
# Prior distributions for latent variables
112+
alpha = Normal('alpha', 0, sd=100)
113+
beta = Normal('beta', 0, sd=100)
114+
115+
# Linear combinations of parameters
116+
theta = invlogit(alpha + beta*dose)
117+
118+
# Model likelihood
119+
deaths = Binomial('deaths', n=n, p=theta, observed=y)
120+
121+
Save this file, then from a python shell (or another file in the same directory), call.
122+
123+
::
124+
125+
with bioassay_model:
126+
127+
# Draw wamples
128+
trace = sample(1000, njobs=2)
129+
# Plot two parameters
130+
forestplot(trace, varnames=['alpha', 'beta'])
131+
132+
This example will generate 1000 posterior samples on each of two cores using the NUTS algorithm, preceded by 500 tuning samples (the default number). The sampler is also initialized using variational inference.
133+
134+
::
135+
136+
Auto-assigning NUTS sampler...
137+
Initializing NUTS using ADVI...
138+
Average Loss = 12.562: 6%|▌ | 11412/200000 [00:00<00:14, 12815.82it/s]
139+
Convergence archived at 11900
140+
Interrupted at 11,900 [5%]: Average Loss = 15.168
141+
100%|██████████████████████████████████████| 1500/1500 [00:01<00:00, 787.56it/s]
142+
143+
The sample is returned as arrays inside of a ``MultiTrace`` object, which is then passed to a plotting function. The resulting graphic shows a forest plot of the random variables in the model, along with a convergence diagnostic (R-hat) that indicates our model has converged.
144+
145+
.. image:: ./images/forestplot.png

0 commit comments

Comments
 (0)