Skip to content

Commit 8b6ae2f

Browse files
committed
Extra details about rejection vs transformation
1 parent 1016b30 commit 8b6ae2f

File tree

1 file changed

+33
-5
lines changed

1 file changed

+33
-5
lines changed

developers/transforms/bijectors/index.qmd

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ end
169169
# `x0` : the initial value
170170
# Returns a vector of samples, plus the number of times we went out of bounds.
171171
function mh(logp, n_samples, in_bounds; x0=1.0)
172-
samples = []
172+
samples = [x0]
173173
x = x0
174174
n_out_of_bounds = 0
175175
for _ in 2:n_samples
@@ -231,7 +231,7 @@ function logq(y)
231231
x = f_inv(y)
232232
return B.logpdf_with_trans(d, x, true)
233233
end
234-
samples_transformed, _ = mh(logq, 10000, x -> true);
234+
samples_transformed, n_oob_transformed = mh(logq, 10000, x -> true);
235235
```
236236

237237
Now, this process gives us samples that have been transformed, so we need to un-transform them to get the samples from the original distribution:
@@ -248,6 +248,12 @@ println("expected mean: $(exp(0 + (1^2/2)))")
248248
println(" actual mean: $(mean(samples_untransformed))")
249249
```
250250

251+
On top of that, we can also verify that we don't ever go out of bounds:
252+
253+
```{julia}
254+
println("went out of bounds $n_oob_transformed/10000 times")
255+
```
256+
251257
### Which one is better?
252258

253259
In the subsections above, we've seen two different methods of sampling from a constrained distribution:
@@ -257,9 +263,16 @@ In the subsections above, we've seen two different methods of sampling from a co
257263

258264
(Note that both of these methods are applicable to other samplers as well, such as Hamiltonian Monte Carlo.)
259265

260-
Of course, the question then becomes which one of these is better.
261-
We might look at the sample means above to see which one is 'closer' to the expected mean, but that's not a very robust method because the sample mean is itself random.
262-
What we could do is to perform both methods many times and see how reliable the sample mean is.
266+
Of course, a natural question to then ask is which one of these is better!
267+
268+
One option might be look at the sample means above to see which one is 'closer' to the expected mean.
269+
However, that's not a very robust method because the sample mean is itself random, and if we were to use a different random seed we might well reach a different conclusion.
270+
271+
Another possibility we could look at the number of times the sample was rejected.
272+
Does a lower rejection rate (as in the transformed case) imply that the method is better?
273+
As it happens, this might seem like an intuitive conclusion, but it's not necessarily the case: for example, the sampling in unconstrained space could be much less efficient, such that even though we're not _rejecting_ samples, the ones that we do get are overly correlated and thus not representative of the distribution.
274+
275+
A robust comparison would involve performing both methods many times and seeing how _reliable_ the sample mean is.
263276

264277
```{julia}
265278
function get_sample_mean(; transform)
@@ -286,6 +299,21 @@ mean(means_with_transformation), var(means_with_transformation)
286299

287300
We can see from this small study that although both methods give us the correct mean (on average), the method with the transformation is more reliable, in that the variance is much lower!
288301

302+
::: {.callout-note}
303+
Alternatively, we could also try to directly measure how correlated the samples are.
304+
One way to do this is to calculate the _effective sample size_ (ESS), which is described in [the Stan documentation](https://mc-stan.org/docs/reference-manual/analysis.html#effective-sample-size.section), and implemented in [MCMCChains.jl](https://github.com/TuringLang/MCMCChains.jl/).
305+
A larger ESS implies that the samples are less correlated, and thus more representative of the underlying distribution:
306+
307+
```{julia}
308+
using MCMCChains: Chains, ess
309+
310+
rejection = first(mh(logp, 10000, x -> x > 0))
311+
transformation = f_inv(first(mh(logq, 10000, x -> true)))
312+
chn = Chains(hcat(rejection, transformation), [:rejection, :transformation])
313+
ess(chn)
314+
```
315+
:::
316+
289317
### What happens without the Jacobian?
290318

291319
In the transformation method above, we used `Bijectors.logpdf_with_trans` to calculate the log probability density of the transformed distribution.

0 commit comments

Comments
 (0)