|
| 1 | +--- |
| 2 | +title: Introduction to Gaussian Processes using JuliaGPs and Turing.jl |
| 3 | +permalink: /tutorials/:name/ |
| 4 | +redirect_from: tutorials/15-gaussian-processes/ |
| 5 | +weave_options: |
| 6 | + error : false |
| 7 | +--- |
| 8 | + |
| 9 | +[JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) packages integrate well with Turing.jl because they implement the Distributions.jl |
| 10 | +interface. |
| 11 | +You should be able to understand what is going on in this tutorial if you know what a GP is. |
| 12 | +For a more in-depth understanding of the |
| 13 | +[JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) functionality |
| 14 | +used here, please consult the |
| 15 | +[JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) docs. |
| 16 | + |
| 17 | +In this tutorial, we will model the putting dataset discussed in chapter 21 of |
| 18 | +[Bayesian Data Analysis](http://www.stat.columbia.edu/%7Egelman/book/). |
| 19 | +The dataset comprises the result of measuring how often a golfer successfully gets the ball |
| 20 | +in the hole, depending on how far away from it they are. |
| 21 | +The goal of inference is to estimate the probability of any given shot being successful at a |
| 22 | +given distance. |
| 23 | + |
| 24 | +Let's download the data and take a look at it: |
| 25 | + |
| 26 | +```julia |
| 27 | +using CSV, DataDeps, DataFrames |
| 28 | + |
| 29 | +ENV["DATADEPS_ALWAYS_ACCEPT"] = true |
| 30 | +register( |
| 31 | + DataDep( |
| 32 | + "putting", |
| 33 | + "Putting data from BDA", |
| 34 | + "http://www.stat.columbia.edu/~gelman/book/data/golf.dat", |
| 35 | + "fc28d83896af7094d765789714524d5a389532279b64902866574079c1a977cc", |
| 36 | + ), |
| 37 | +) |
| 38 | + |
| 39 | +fname = joinpath(datadep"putting", "golf.dat") |
| 40 | +df = CSV.read(fname, DataFrame; delim=' ', ignorerepeated=true) |
| 41 | +df[1:5, :] |
| 42 | +``` |
| 43 | + |
| 44 | +We've printed the first 5 rows of the dataset (which comprises only 19 rows in total). |
| 45 | +Observe it has three columns: |
| 46 | + |
| 47 | + 1. `distance` -- how far away from the hole. I'll refer to `distance` as `d` throughout the rest of this tutorial |
| 48 | + 2. `n` -- how many shots were taken from a given distance |
| 49 | + 3. `y` -- how many shots were successful from a given distance |
| 50 | + |
| 51 | +We will use a Binomial model for the data, whose success probability is parametrised by a |
| 52 | +transformation of a GP. Something along the lines of: |
| 53 | +$$ |
| 54 | +f \sim \operatorname{GP}(0, k) \\ |
| 55 | +y_j \mid f(d_j) \sim \operatorname{Binomial}(n_j, g(f(d_j))) \\ |
| 56 | +g(x) := \frac{1}{1 + e^{-x}} |
| 57 | +$$ |
| 58 | + |
| 59 | +To do this, let's define our Turing.jl model: |
| 60 | + |
| 61 | +```julia |
| 62 | +using AbstractGPs, LogExpFunctions, Turing |
| 63 | + |
| 64 | +@model function putting_model(d, n; jitter=1e-4) |
| 65 | + v ~ Gamma(2, 1) |
| 66 | + l ~ Gamma(4, 1) |
| 67 | + f = GP(v * with_lengthscale(SEKernel(), l)) |
| 68 | + f_latent ~ f(d, jitter) |
| 69 | + y ~ product_distribution(Binomial.(n, logistic.(f_latent))) |
| 70 | + return (fx=f(d, jitter), f_latent=f_latent, y=y) |
| 71 | +end |
| 72 | +``` |
| 73 | + |
| 74 | +We first define an `AbstractGPs.GP`, which represents a distribution over functions, and |
| 75 | +is entirely separate from Turing.jl. |
| 76 | +We place a prior over its variance `v` and length-scale `l`. |
| 77 | +`f(d, jitter)` constructs the multivariate Gaussian comprising the random variables |
| 78 | +in `f` whose indices are in `d` (+ a bit of independent Gaussian noise with variance |
| 79 | +`jitter` -- see [the docs](https://juliagaussianprocesses.github.io/AbstractGPs.jl/dev/api/#FiniteGP-and-AbstractGP) |
| 80 | +for more details). |
| 81 | +`f(d, jitter) isa AbstractMvNormal`, and is the bit of AbstractGPs.jl that implements the |
| 82 | +Distributions.jl interface, so it's legal to put it on the right hand side |
| 83 | +of a `~`. |
| 84 | +From this you should deduce that `f_latent` is distributed according to a multivariate |
| 85 | +Gaussian. |
| 86 | +The remaining lines comprise standard Turing.jl code that is encountered in other tutorials |
| 87 | +and Turing documentation. |
| 88 | + |
| 89 | +Before performing inference, we might want to inspect the prior that our model places over |
| 90 | +the data, to see whether there is anything that is obviously wrong. |
| 91 | +These kinds of prior predictive checks are straightforward to perform using Turing.jl, since |
| 92 | +it is possible to sample from the prior easily by just calling the model: |
| 93 | + |
| 94 | +```julia |
| 95 | +m = putting_model(Float64.(df.distance), df.n) |
| 96 | +m().y |
| 97 | +``` |
| 98 | + |
| 99 | +We make use of this to see what kinds of datasets we simulate from the prior: |
| 100 | + |
| 101 | +```julia |
| 102 | +using Plots |
| 103 | + |
| 104 | +function plot_data(d, n, y, xticks, yticks) |
| 105 | + ylims = (0, round(maximum(n), RoundUp; sigdigits=2)) |
| 106 | + margin = -0.5 * Plots.mm |
| 107 | + plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false) |
| 108 | + bar!(plt, d, n; color=:red, label="", alpha=0.5) |
| 109 | + bar!(plt, d, y; label="", color=:blue, alpha=0.7) |
| 110 | + return plt |
| 111 | +end |
| 112 | + |
| 113 | +# Construct model and run some prior predictive checks. |
| 114 | +m = putting_model(Float64.(df.distance), df.n) |
| 115 | +hists = map(1:20) do j |
| 116 | + xticks = j > 15 ? :auto : nothing |
| 117 | + yticks = rem(j, 5) == 1 ? :auto : nothing |
| 118 | + return plot_data(df.distance, df.n, m().y, xticks, yticks) |
| 119 | +end |
| 120 | +plot(hists...; layout=(4, 5)) |
| 121 | +``` |
| 122 | + |
| 123 | +In this case, the only prior knowledge I have is that the proportion of successful shots |
| 124 | +ought to decrease monotonically as the distance from the hole increases, which should show |
| 125 | +up in the data as the blue lines generally going down as we move from left to right on each |
| 126 | +graph. |
| 127 | +Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP, |
| 128 | +and we can see this in some of the plots above, so we must hope that we have enough data to |
| 129 | +ensure that this relationship approximately holds under the posterior. |
| 130 | +In any case, you can judge for yourself whether you think this is the most useful |
| 131 | +visualisation that we can perform -- if you think there is something better to look at, |
| 132 | +please let us know! |
| 133 | + |
| 134 | +Moving on, we generate samples from the posterior using the default `NUTS` sampler. |
| 135 | +We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has |
| 136 | +better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on |
| 137 | +this example. See Turing.jl's docs on Automatic Differentiation for more info. |
| 138 | + |
| 139 | +```julia |
| 140 | +using Random, ReverseDiff |
| 141 | + |
| 142 | +Turing.setadbackend(:reversediff) |
| 143 | +Turing.setrdcache(true) |
| 144 | + |
| 145 | +m_post = m | (y=df.y,) |
| 146 | +chn = sample(Xoshiro(123456), m_post, NUTS(), 1_000) |
| 147 | +``` |
| 148 | + |
| 149 | +We can use these samples and the `posterior` function from `AbstractGPs` to sample from the |
| 150 | +posterior probability of success at any distance we choose: |
| 151 | + |
| 152 | +```julia |
| 153 | +d_pred = 1:0.2:21 |
| 154 | +samples = map(generated_quantities(m_post, chn)[1:10:end]) do x |
| 155 | + return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4))) |
| 156 | +end |
| 157 | +p = plot() |
| 158 | +plot!(d_pred, reduce(hcat, samples); label="", color=:blue, alpha=0.2) |
| 159 | +scatter!(df.distance, df.y ./ df.n; label="", color=:red) |
| 160 | +``` |
| 161 | + |
| 162 | +We can see that the general trend is indeed down as the distance from the hole increases, |
| 163 | +and that if we move away from the data, the posterior uncertainty quickly inflates. |
| 164 | +This suggests that the model is probably going to do a reasonable job of interpolating |
| 165 | +between observed data, but less good a job at extrapolating to larger distances. |
0 commit comments