|
| 1 | +# Inspect results of fitted problem |
| 2 | + |
| 3 | + |
| 4 | +``` @meta |
| 5 | +CurrentModule = HybridVariationalInference |
| 6 | +``` |
| 7 | + |
| 8 | +First load necessary packages. |
| 9 | + |
| 10 | +``` julia |
| 11 | +using HybridVariationalInference |
| 12 | +using StableRNGs |
| 13 | +using ComponentArrays: ComponentArrays as CA |
| 14 | +using SimpleChains # for reloading the optimized problem |
| 15 | +using DistributionFits |
| 16 | +using JLD2 |
| 17 | +using CairoMakie |
| 18 | +using PairPlots # scatterplot matrices |
| 19 | +``` |
| 20 | + |
| 21 | +After redefinig the process-based model (currently JLD2 cannot save functions), |
| 22 | +the previously optimized Problem can be loaded. |
| 23 | + |
| 24 | +``` julia |
| 25 | +function f_doubleMM_sites(θc::CA.ComponentMatrix, xPc::CA.ComponentMatrix) |
| 26 | + # extract several covariates from xP |
| 27 | + ST = typeof(CA.getdata(xPc)[1:1,:]) # workaround for non-type-stable Symbol-indexing |
| 28 | + S1 = (CA.getdata(xPc[:S1,:])::ST) |
| 29 | + S2 = (CA.getdata(xPc[:S2,:])::ST) |
| 30 | + # |
| 31 | + # extract the parameters as row-repeated vectors |
| 32 | + n_obs = size(S1, 1) |
| 33 | + VT = typeof(CA.getdata(θc)[:,1]) # workaround for non-type-stable Symbol-indexing |
| 34 | + (r0, r1, K1, K2) = map((:r0, :r1, :K1, :K2)) do par |
| 35 | + p1 = CA.getdata(θc[:, par]) ::VT |
| 36 | + repeat(p1', n_obs) # matrix: same for each concentration row in S1 |
| 37 | + end |
| 38 | + # |
| 39 | + # each variable is a matrix (n_obs x n_site) |
| 40 | + r0 .+ r1 .* S1 ./ (K1 .+ S1) .* S2 ./ (K2 .+ S2) |
| 41 | +end |
| 42 | +``` |
| 43 | + |
| 44 | +``` julia |
| 45 | +fname = "intermediate/basic_cpu_results.jld2" |
| 46 | +print(abspath(fname)) |
| 47 | +probo, interpreters = load(fname, "probo", "interpreters"); |
| 48 | +``` |
| 49 | + |
| 50 | +## Sample the posterior |
| 51 | + |
| 52 | +A sample of both, posterior, and predictive posterior can be obtained |
| 53 | +using function [`sample_posterior`](@ref). |
| 54 | + |
| 55 | +``` julia |
| 56 | +using StableRNGs |
| 57 | +rng = StableRNG(112) |
| 58 | +n_sample_pred = 400 |
| 59 | +(; θsP, θsMs) = sample_posterior(rng, probo; n_sample_pred) |
| 60 | +``` |
| 61 | + |
| 62 | +Lets look at the results. |
| 63 | + |
| 64 | +``` julia |
| 65 | +size(θsP), size(θsMs) |
| 66 | +``` |
| 67 | + |
| 68 | + ((1, 400), (800, 2, 400)) |
| 69 | + |
| 70 | +The last dimension is the number of samples, the second-last dimension is |
| 71 | +the respective parameter. `θsMs` has an additional dimension denoting |
| 72 | +the site for which parameters are samples. |
| 73 | + |
| 74 | +They are ComponentArrays with the parameter dimension names that can be used: |
| 75 | + |
| 76 | +``` julia |
| 77 | +θsMs[1,:r1,:] # sample of r1 of the first site |
| 78 | +``` |
| 79 | + |
| 80 | +### Corner plots |
| 81 | + |
| 82 | +The relation between different variables can be well inspected by |
| 83 | +scatterplot matrices, also called corner plots or pair plots. |
| 84 | +`PairPlots.jl` provides a Makie-implementation of those. |
| 85 | + |
| 86 | +Here, we plot the global parameters and the site-parameters for the first site. |
| 87 | + |
| 88 | +``` julia |
| 89 | +i_site = 1 |
| 90 | +θ1 = vcat(θsP, θsMs[i_site,:,:]) |
| 91 | +θ1_nt = NamedTuple(k => CA.getdata(θ1[k,:]) for k in keys(θ1[:,1])) # |
| 92 | +plt = pairplot(θ1_nt) |
| 93 | +``` |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | +The plot shows that parameters for the first site, *K*₁ and *r*₁, are correlated, |
| 98 | +but that we did not model correlation with the global parameter, *K*₂. |
| 99 | + |
| 100 | +Note that this plots shows only the first out of 800 sites. |
| 101 | +HVI estimated a 1602-dimensional posterior distribution including |
| 102 | +covariances among parameters. |
| 103 | + |
| 104 | +### Expected values and marginal variances |
| 105 | + |
| 106 | +Lets look at how the estimated uncertainty of a site parameter changes with |
| 107 | +its expected value. |
| 108 | + |
| 109 | +``` julia |
| 110 | +par = :K1 |
| 111 | +θmean = [mean(θsMs[s,par,:]) for s in axes(θsMs, 1)] |
| 112 | +θsd = [std(θsMs[s,par,:]) for s in axes(θsMs, 1)] |
| 113 | +fig = Figure(); ax = Axis(fig[1,1], xlabel="mean($par)",ylabel="sd($par)") |
| 114 | +scatter!(ax, θmean, θsd) |
| 115 | +fig |
| 116 | +``` |
| 117 | + |
| 118 | + |
| 119 | + |
| 120 | +We see that *K*₁ across sites ranges from about 0.18 to 0.25, and that |
| 121 | +its estimated uncertainty is aobut 0.034, slightly decreasing with the |
| 122 | +values of the parameter. |
| 123 | + |
| 124 | +## Predictive Posterior |
| 125 | + |
| 126 | +In addition to the uncertainty in parameters, we are also interested in |
| 127 | +the uncertainty of predictions, i.e. the predictive posterior. |
| 128 | + |
| 129 | +We cam either run the PBM for all the parameter samples that we obtained already, |
| 130 | +using [`apply_process_model`](@ref), or use [`predict_hvi`](@ref) which combines |
| 131 | +sampling the posterior and predictive posterior and returns the additional |
| 132 | +`NamedTuple` entry `y`. |
| 133 | + |
| 134 | +``` julia |
| 135 | +(; y, θsP, θsMs) = predict_hvi(rng, probo; n_sample_pred) |
| 136 | +``` |
| 137 | + |
| 138 | +``` julia |
| 139 | +size(y) |
| 140 | +``` |
| 141 | + |
| 142 | + (8, 800, 400) |
| 143 | + |
| 144 | +Again, the last dimension is the sample. |
| 145 | +The other dimensions correspond to the observations we provided for the fitting: |
| 146 | +The first dimension is the observation within one site, the second dimension is the site. |
| 147 | + |
| 148 | +Lets look on how the uncertainty of the 4th observations scales with its |
| 149 | +predicted magnitude across sites. |
| 150 | + |
| 151 | +``` julia |
| 152 | +i_obs = 4 |
| 153 | +ymean = [mean(y[i_obs,s,:]) for s in axes(θsMs, 1)] |
| 154 | +ysd = [std(y[i_obs,s,:]) for s in axes(θsMs, 1)] |
| 155 | +fig = Figure(); ax = Axis(fig[1,1], xlabel="mean(y$i_obs)",ylabel="sd(y$i_obs)") |
| 156 | +scatter!(ax, ymean, ysd) |
| 157 | +fig |
| 158 | +``` |
| 159 | + |
| 160 | + |
| 161 | + |
| 162 | +We see that observed values for associated substrate concentrations range about from |
| 163 | +0.51 to 0.59 with an estimated standard deviation around 0.005 that decreases |
| 164 | +with the observed value. |
0 commit comments