👻 Treating missing data #78
Replies: 2 comments
-
In the prelim calibrations of the CWD model, about 40% of population data is missing, and 80% of fawn-doe ratio and yearling share fractions are missing. The only complete series is for test surveillance. My approach so far has been to parameterize the missing values, with a fairly loose std deviation. I think this is essentially what Stan is doing by separating y_obs and y_mis. This seems easier than populating all the data series with imputed values and associated errors (at least in Vensim), and I don't see why the outcome would be any different. I wouldn't generally worry very much about this. Many dynamic models are going to have lots of unmeasured states, and it would be kind of silly to have to impute something for everything. I would tackle this situationally as it appears to be needed. Having an implicit improper prior for sparse missing points that you're simply ignoring really doesn't seem that bad. This would be an interesting case for some SBC experiments. |
Beta Was this translation helpful? Give feedback.
-
The extreme case seems like the situation in the pred/prey model, where the time step is small (.0625 time units), but any real-world dataset would be sampled at intervals (1 time unit for example). Imputing or parameterizing all the missing data points between time units seems like a complete waste of time. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Celia brought up an important question regarding missing data during 879 seminar. Which is better: impute then MCMC vs MCMC.
This document describes how missing data is treated in statistics as latent data variable. Based on my experiment, with 70% of missing data, out of sample accuracy prediction accuracy was better when we first impute then do MCMC.
Beta Was this translation helpful? Give feedback.
All reactions