-
Notifications
You must be signed in to change notification settings - Fork 14
Description
The goal is to add the Bayesian bootstrap as a first-class alternative to leave-one-out cross validation in this package. I believe the Bayesian bootstrap should provide the major advantage of letting users plot a full posterior distribution, rather than just having a point estimate and standard error. Aside from the actual informational difference, I've noticed that plotting posteriors is a good way to help people intuitively understand that the point estimates aren't special. Show someone a point estimate and a standard error and they will usually either ignore the standard error or construct a 95% normal confidence interval.
Mentioning @topipa because Aki suggested I talk to you about this, and said you've built something similar before. I know that the bootstrap tends to underestimate the bias caused by overfitting, because bootstrap resamples will be more similar to the data than a new sample from the original distribution would be -- did your own implementation use any corrections for this bias?
My own thoughts on how to correct this:
- Iterated bootstrap techniques, and
- Adding random noise to resamples -- instead of assigning a random Dirichlet-distributed probability to every observation
x
, we can draw random observationsx + N
, whereN ~ Normal(0, Σ / n)
, and then assign a random Dirichlet probability to each of these resamples. I've seen Tim Hesterberg suggest this in his textbook, but Aki seemed to suggest it would be a bad idea. Intuitively I'd expect this to reduce the bias caused by resampling, since we'd at least be getting the variance of the underlying distribution correct, but I could be wrong.
I have an initial implementation of a basic BB here, although it's not quite working yet -- the estimates seem to be slightly off, but I'm not sure why.