-
-
Notifications
You must be signed in to change notification settings - Fork 36
Description
I'm working on a Stan program that includes a generated quantities block that generates log_lik for use with loo and yrep for posterior predictive checks. This looks something like (with some parts abbreviated as ...):
generated quantities {
vector[n] log_lik;
array[n] int yrep;
...
for (i in 1:n) {
...
log_lik[i] = neg_binomial_2_lpmf(y[i] | mu[i], phi[i]);
}
yrep = neg_binomial_2_rng(mu, phi);
}I'm using cmdstanr to fit the model and run loo with moment-matching:
mod <- cmdstanr::cmdstan_model(...)
fit <- mod$sample(...)
fit_loo <- fit$loo(moment_match = TRUE, cores = 4)During the operation of loo_moment_match(), I sometimes get a couple error/exception messages that appear to stem from overflow in the *_rng function in the generated quantities block. These all look like:
Error : Exception: neg_binomial_2_rng: Random number that came from gamma distribution is 1.47285e+09, but must be less than 1073741824.000000 (in '/var/folders/2k/c0vy7xwj4kb9x7hbgtpq5m640000gn/T//RtmpE8xV8L/model-6c3130f99d6e.stan', line 83, column 4 to column 39)
Further, these messages are sometimes (but not usually) followed by an error that causes loo_moment_match() to fail:
Error in mm_list[[ii]]$i : $ operator is invalid for atomic vectors
In addition: Warning message:
In parallel::mclapply(X = I, mc.cores = cores, FUN = function(i) loo_moment_match_i_fun(i)) :
scheduled cores 4, 1, 3 encountered errors in user code, all values of the jobs will be affected
To the best of my understanding, this appears to happen because loo_moment_match_i_fun() is failing for one or more cases. Perhaps mm_list[[ii]] is NA?
Lines 130 to 131 in 6e7001e
| mm_list <- parallel::mclapply(X = I, mc.cores = cores, | |
| FUN = function(i) loo_moment_match_i_fun(i)) |
Lines 142 to 143 in 6e7001e
| for (ii in seq_along(I)) { | |
| i <- mm_list[[ii]]$i |
I get a small number (~1-3) of the error/exception messages pretty consistently, but the error that causes loo_moment_match() to fail is less common. One place that I've been able to produce this error consistently is within a targets pipeline, which suggests to me that this is something that can be influenced by the RNG state. When I did get this error, it was preceded by ~10 of those error/exception messages. I can confirm that this error can also be produced without targets or callr, just less consistently. I'm using cores = 4 here, but the error can still occur with cores = 1. Commenting out code for yrep and *_rng in the Stan file eliminates the issue entirely, but it is (very so slightly) inconvenient to have to make this change depending on whether I want to use loo_moment_match() with the fitted model. I haven't encountered this problem when the *_rng function is something that is less likely to overflow than the negative binomial.
I wanted to report this issue here since it seems to have something to do with loo_moment_match(). It feels like it could be something related to or not entirely covered by #262. If this is expected behavior, I would appreciate any tips on how to better deal with having both log_lik and yrep in the generated quantities block when it comes to using loo_moment_match(). I'm sorry if any of this is off base, as I do not have a good understanding of the inner workings of the moment-matching code.
Some system info:
> packageVersion("loo")
[1] ‘2.8.0.9000’
> packageVersion("cmdstanr")
[1] ‘0.8.1’
> cmdstanr::cmdstan_version()
[1] "2.35.0"
> R.version
_
platform aarch64-apple-darwin20
arch aarch64
os darwin20
system aarch64, darwin20
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14)
nickname Race for Your Life