Skip to content

Latest commit

 

History

History
1590 lines (1321 loc) · 50.4 KB

File metadata and controls

1590 lines (1321 loc) · 50.4 KB

12. Multilevel Models

As models move from one cluster–individual, group, location–in the data to another, estimating parameters for each cluster, they forget everything about the previous clusters.

We want models to learning simultaneously about each cluster while learning about the population of clusters.

Represent the population of cafes and learn about the population. Learn priors from data.

multilevel models remember features of each cluster in the data as they learn about all of the clusters.

Benefits of multilevel models include:

  1. Improved estimates for repeat sampling. When more than one observation arises from the same individual, location, or time, then traditional, single-level models either maximally underfit or overfit the data.
  2. Improvied estiamtes for imbalance in sampling. When some individuals, locations, or times are samples more than others, multilevels models automatically cope with differing uncertainty across these clusters. This prevents over-sampled clusters from unfairly dominating inference.
  3. Estimates of variation. If our research questions include variation among individuals or other groups within the data, then multilevel models are a big help, because they model variation explicitly.
  4. Avoid averaging, retain variation. Frequently, scholars pre-average some data to construct variables. This can be dangerous, because averaging removes variation, and there are also typically several different ways to perform the averaging. Averaging therefore constructs false confidence and introduces arbitrary data transformations. Multilevel models allow us to preserve the uncertainty and avoid data transformations.

Costs of multilevel approach

  1. Define the distributions from which the characteristics of the clusters arise. Conservative maximum entropy distributions are the solution
  2. there are new estimation challenges > MCMC estimation
  3. multilevel model can be hard to understand, because they make predictions at different levels of the data.

12.1 Example: Multilevel tadpoles

library(rethinking)
data(reedfrogs)
d <- reedfrogs
str(d)

Multiple observations, the tadpoles in this case, are made within each cluster. A multilevel model, in which we simultaneously estimate both an intercept for each tank and the variation among tanks, is what we want. this will be a varying intercepts model. Varying intercepts are the simplest kind of varying effects.

                                        # make the tank the cluster variable
d$tank <- 1:nrow(d)

                                        #fit
m12.1 <- map2stan(
    alist(
        surv ~ dbinom(density,p ),
        logit(p) <- a_tank[tank],
        a_tank[tank] ~ dnorm(0,5)
    ), data=d
)

precis(m12.1, depth=2)

Hyperparameters are parameters for parameters, their priors are called hyperpriors.

m12.2 <- map2stan(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank],
        a_tank[tank] ~ dnorm(a, sigma),
        a ~ dnorm(0,1),
        sigma ~ dcauchy(0,1)
    ), data=d, iter=4000, chains=4)

compare(m12.1, m12.2)

#extract stan samples
post <- extract.samples(m12.2)

                                        #compute median intercept for each tank
                                        # also transform to probability with logistic
d$propsurv.est <- logistic(apply(post$a_tank, 2, median))

                                        # display raw proportions surviving in each tank
plot(d$propsurv, ylim=c(0,1), pch=16, xaxt='n',
     xlab="tank", ylab="proportion survival", col=rangi2)
axis(1, at=c(1,16,32,48), labels=c(1,16,32,48))


                                        #overlay posterior means
points(d$propsurv.est)

                                        #mark posterior median probability across tanks
abline(h=logistic(median(post$a)), lty=2)

                                        # draw vertical dividers between tanks
abline(v=16.5, lwd=0.5)
abline(v=32.5, lwd=0.5)
text(8, 0, "small tanks")
text(16+8, 0, "medium tanks")
text(32+8, 0, "large tanks")

Shrinkage results from regularizatin. Moving towards multilevel estimate-estimated median survival proportion.

Pooling each tank provides information that can be used to improve the estimates for all the other tanks.

‘Sampling’ from a posterior distribution is not a simulation of empirical sampling. It’s just a convenient way to characterize and work with the uncertainty in the distribution.

# show first 100 populations in the posterior

par(mfrow=c(1,2))

plot(NULL, xlim=c(-3,4), ylim=c(0,0.35),
     xlab="log-odds survive", ylab="Density")
for (i in 1:100)
    curve(dnorm(x, post$a[i], post$sigma[i]), add=TRUE,
          col=col.alpha("black", 0.2))

                                        # sample 8000 imaginary tanks from the posterior distribution
sim_tanks <- rnorm(8000, post$a, post$sigma)

                                        # transform to probability and visualize
dens(logistic(sim_tanks), xlab="probability survive")

12.2. Varying effects and the underfitting/overfitting trade-off

Varying intercepts are just regularized estimates, but adaptively regularized by estimating how diverse the clusters are while estimates the features of each cluster. The reason that the varying intercepts provide better estimates is that they do a better job of trading off underfitting and overfitting.

  1. Suppose you use overall mean σ, to make your predictions. Quite precise because of lot of data, however, unlikely to fit the mean of a particular cluster. The total sample mean underfits the data. Complete pooling and assumes that variation among ponds is zero.
  2. Make separate intercepts for each cluster. Little data contributes to estimate > thus imprecise, especially for smaller clusters. Error of the estimates is high and they are overfit to the data. No pooling estimates.
  3. When you estimate varying intercepts, you use partial pooling of information to produce estimates for each cluster that are less underfit than the grand mean and less overfit than the no-pooling estimates. For large cluster difference will be less than for smaller clusters.

Learning to simulate and validate models and model fitting is extremely valuable.

12.2.1. The model

To simulate data, we need to assign values to:

  • α, the average log-odds of survival in the entire popularion of ponds
  • σ, the standard deviation of the distribution of log-odds of survival among ponds
  • αpond, a vector of individual pond intercepts, one for each pond

Assign sample sizes to each cluster.

12.2.2. Assign values to the parameters

a <- 1.4
sigma <- 1.5
nponds <- 60
ni <- as.integer(rep(c(5,10,25,35), each=15))

a_pond <- rnorm(nponds, mean=a, sd=sigma)

dsim <- data.frame(pond=1:nponds, ni=ni, true_a=a_pond)

12.2.3. Simulate Survivors

Each pond $i$ has $ni$ potential survivors and nature flips each tadpole’s coin with probability of survival $pi$. this probability $pi$ is implied by the model definition

$$pi = \dfrac{exp(αi)}{1 + exp(αi)}$$

dsim$si <- rbinom(nponds, prob=logistic(dsim$true_a), size=dsim$ni)

12.2.4. Compute the no-pooling estimates

dsim$p_nopool <- dsim$si /dsim$ni

This column contains the empirical proportions of survivors in each pond. These are the same no-pooling estimates you’d get by fitting a model with a dummy variable for each pond and flat priors tht induce no regularization.

12.2.5. Compute the partial-pooling estimates

m12.3 <- map2stan(
    alist(
        si ~ dbinom(ni, p),
        logit(p) <- a_pond[pond],
        a_pond[pond] ~ dnorm(a, sigma),
        a ~ dnorm(0, 1),
        sigma ~ dcauchy(0, 1)
    ), data=dsim, iter=1e4, warmup=1000)

precis(m12.3, depth=2)
estimated.a_pond <- as.numeric(coef(m12.3)[1:60])
dsim$p_partpool <- logistic(estimated.a_pond)

#true per-pond survival probabilities
dsim$p_true <- logistic(dsim$true_a)

#compute absolute error between the estimates and the true varying effects
nopool_error <- abs(dsim$p_nopool - dsim$p_true)
partpool_error <- abs(dsim$p_partpool - dsim$p_true)

plot(1:60, nopool_error, xlab="pond", ylab="absolute error",
     col=rangi2, pch=16)
points(1:60, partpool_error)

12.3 More than one type of cluster

12.3.1. Multilevel chimpanzees

library(rethinking)
data(chimpanzees)
d <- chimpanzees
d$recipient <- NULL

m12.4 <- map2stan(
    alist(
        pulled_left ~ dbinom(1,p),
        logit(p) <- a + a_actor[actor] + (bp + bpC*condition)*prosoc_left,
        a_actor[actor] ~ dnorm(0, sigma_actor),
        a ~ dnorm(0,10),
        bp~ dnorm(0,10),
        bpC ~ dnorm(0,10),
        sigma_actor ~ dcauchy(0,1)
    ), data=d, warmup=1000, iter=5000, chains=4, cores=4)

post <- extract.samples(m12.4)
total_a_actor <- sapply(1:7, function(actor) post$a + post$a_actor[,actor])
round(apply(total_a_actor, 2, mean), 2)
plot(m12.4)

12.3.2. Two types of cluster

d$block_id <- d$block #block name is reserved by stan

m12.5 <- map2stan(
    alist(
        pulled_left ~ dbinom(1,p),
        logit(p) <- a + a_actor[actor] + a_block[block_id] +
            (bp + bpc*condition) * prosoc_left,
        a_actor[actor] ~ dnorm(0, sigma_actor),
        a_block[block_id] ~ dnorm(0, sigma_block),
        c(a, bp, bpc) ~ dnorm(0, 10),
        sigma_actor ~ dcauchy(0,1),
        sigma_block ~ dcauchy(0,1)
    ), data=d, warmup=1000, iter=6000, chains=4, cores=4)

precis(m12.5, depth=2)
plot(precis(m12.5, depth=2))
post <- extract.samples(m12.5)
dens(post$sigma_block, xlab="sigma", xlim=c(0,4))
dens(post$sigma_actor, col=rangi2, lwd=2, add=TRUE)
text(2, 0.85, "actor", col=rangi2)
text(0.75, 2, "block")

12.4 Multilevel posterior predictions

12.4.1. Posterior predictions for same clusters

chimp <- 2
d.pred <- list(
    prosoc_left = c(0,1,0,1), #right/left/right/left
    condition = c(0,0,1,1),
    actor = rep(chimp,4)
)

link.m12.4 <- link(m12.4, data=d.pred)
pred.p <- apply(link.m12.4, 2, mean)
pred.p.PI <- apply(link.m12.4, 2, PI)
p.link <- function(prosoc_left, condition, actor){
    logodds <- with(post, a + a_actor[,actor] + (bp + bpc * condition)
                    * prosoc_left)
    return(logistic(logodds))
}

prosoc_left <- c(0,1,0,1)
condition <- c(0,0,1,1)
pred.raw <- sapply(1:4, function(i) p.link(prosoc_left[i], condition[i],2))
pred.p <- apply(pred.raw, 2, mean)
pred.p.PI <- apply(pred.raw, 2, PI)

12.4.2. Posterior prediction for new clusters

d.pred <- list(
    prosoc_left = c(0,1,0,1),
    condition = c(0,0,1,1),
    actor = rep(2,4))

                                        # replace varying intercept samples with zeros
                                        # 1000 samples by 7 actors
a_actor_zeros <- matrix(0,1000,7)

                                        #fire up link
                                        # note use of replace list
link.m12.4 <- link(m12.4, n=1000, data=d.pred,
                   replace=list(a_actor=a_actor_zeros))

                                        #summarize and plot

pred.p.mean <- apply(link.m12.4, 2, mean)
pred.p.PI <- apply(link.m12.4, 2, PI, prob=0.8)
plot(0, 0, type="n", xlab="prosoc_left/condition",
     ylab="proportion pulled left", ylim=c(0,1), xaxt="n",
     xlim=c(1,4))
axis(1, at=1:4, labels=c("0/0","1/0","0/1","1/1"))
lines(1:4, pred.p.mean)
shade(pred.p.PI, 1:4)

#replace varying intercept samples with simulations
post <- extract.samples(m12.4)
a_actor_sims <- rnorm(7000,0,post$sigma_actor)
a_actor_sims <- matrix(a_actor_sims, 1000, 7)

link.m12.4 <- link(m12.4, n=1000, data=d.pred,
                   replace=list(a_actor=a_actor_sims))


pred.p.mean <- apply(link.m12.4, 2, mean)
pred.p.PI <- apply(link.m12.4, 2, PI, prob=0.8)
plot(0, 0, type="n", xlab="prosoc_left/condition",
     ylab="proportion pulled left", ylim=c(0,1), xaxt="n",
     xlim=c(1,4))
axis(1, at=1:4, labels=c("0/0","1/0","0/1","1/1"))
lines(1:4, pred.p.mean)
shade(pred.p.PI, 1:4)

The predictions for an average actor help to visualize the impact of treatment The predictions that are marginal of actor illustrate how vaeriable different actors are, according to the model.

post <- extract.samples(m12.4)
sim.actor <- function(i) {
    sim_a_actor <- rnorm(1, 0, post$sigma_actor[i])
    P <- c(0,1,0,1)
    C <- c(0,0,1,1)
    p <- logistic(post$a[i] +
                  sim_a_actor +
                  (post$bp[i] + post$bpC[i] * C)*P)
    return(p)
}

                                        #empty plot
plot(0, 0, type="n", xlab="prosoc_left/condition",
     ylab="proportion pulled left", ylim=c(0,1), xaxt="n", xlim=c(1,4))
axis(1, at=1:4, labels=c("0/0", "1/0", "0/1", "1/1"))

                                        #plot 50 simulated actors
for (i in 1:100) lines(1:4, sim.actor(i), col=col.alpha("black", 0.5))

Multilevel models contain parameters with different focus. Focus here means which level of the model the parameter makes direct predictions for.

  1. When retrodicting the sample, the parameters that decribe the population of clusters do not influence prediction directly. Hyperparameter had their effect during estimation, by shrinking the varying effect parameters towards a common mean.
  2. The same is true when forecasting a new observation for a cluster that was present in the sample.
  3. When instead we wish to forecast for some new cluster that wasn ot present in the sample, such as a new individual or school or year or location, then we need the hyper-parameters. The hyper-parameters tell us how to forecast a new cluster, by generating a distribution of new per-cluster intercepts.

In the case of over-dispersion, we need to simulate intercepts to account for the over-dispersion.

By estimating the distribution of residuals, we get an estimate of the excess variation, relative to the Poisson expectation.

library(rethinking)
data(Kline)
d <- Kline
d$logpop <- log(d$population)
d$society <- 1:10

                                        # fit models
m12.6 <- map2stan(
    alist(
        total_tools ~ dpois(mu),
        log(mu) <- a + a_society[society] + bp*logpop,
        a ~ dnorm(0,10),
        bp ~ dnorm(0,1),
        a_society[society] ~ dnorm(0, sigma_society),
        sigma_society ~ dcauchy(0,1)
    ), data=d, iter=4000, chains=3)

                                        # you can use postcheck but those predictions use the varying intercepts directly. they do not use the hyper-parameters

                                        # we need to simulate counterfactual societies, using the hyper-parameters.

post <- extract.samples(m12.6)
d.pred <- list(
    logpop = seq(from=6, to=14, length.out=30),
    society=rep(1,30))

a_society_sims <- rnorm(20000, 0, post$sigma_society)
a_society_sims <- matrix(a_society_sims, 2000, 10)
link.m12.6 <- link(m12.6, n=2000, data=d.pred,
                   replace=list(a_society=a_society_sims))

                                        #plot raw data
plot(d$logpop, d$total_tools, col=rangi2, pch=16,
     xlab="log population", ylab="total tools")

                                        #plot posterior median
mu.median <- apply(link.m12.6, 2, median)
lines(d.pred$logpop, mu.median)


mu.PI <- apply(link.m12.6, 2, PI, prob=0.97)
shade(mu.PI, d.pred$logpop)

mu.PI <- apply(link.m12.6, 2, PI, prob=0.89)
shade(mu.PI, d.pred$logpop)

mu.PI <- apply(link.m12.6, 2, PI, prob=0.67)
shade(mu.PI, d.pred$logpop)

12.6 Practice

12E1

a. This prior is narrower and thus more regularizing, leading to more shrinkage

12E2

m12E2 <- map2stan(
    alist(
        y ~ dbinom(1, p ),
        logit(p) <- a_group[i] + b*x_{i},
        a_group[i] ~ dnorm(a_2, sigma),
        b ~ dnorm(0,1),
        a_2 ~ dnorm(0,1)
        sigma ~ dcauchy(0,1)
    )

12E3

The same as 12E2

12E4

m12e4 <- map2stan(
    alist(
        y ~ dpois(lambda),
        log(lambda) <- a_group[i] + b*x_{i},
        a_group[i] ~ dnorm(a_2, sigma),
        b ~ dnorm(0,1),
        a_2 ~ dnorm(0,1),
        sigma ~ dcauchy(0,1)
    ), data)

12E5

m12E5 <- map2stan(
    alist(
        y ~ dpois(lambda),
        log(lambda) <- a_group[i] + a_year[y] + b_1*x + b_2*x,
        a_group[i] ~ dnorm(a_2, sigma),
        a_year[y] ~ dorm(a_1, sigma_2)
        a_2 ~ dnorm(0,1),
        a_1 ~ dnorm(0,1),
        b_1 ~ dnorm(0,1),
        b_2 ~ dnorm(0,1),
        sigma ~ dcauch(0,1),
        sigma_2 ~ dcauchy(0,1)
    ))

12M1

library(rethinking)
data(reedfrogs)
d <- reedfrogs

d$tank <- 1:nrow(d)
d$predation <- ifelse(d$pred == 'pred', 1, 0)
d$size_dummy <- ifelse(d$size =='big', 1, 0)


m12m1.predation <- map2stan(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank] + b_p * predation,
        a_tank[tank] ~ dnorm(a, sigma),
        a ~ dnorm(0, 1),
        b_p ~ dnorm(0, 1),
        sigma ~ dcauchy(0, 1)
    ), data=d, iter=4000, chains=4)


m12m1.size <- map2stan(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank] + b_s * size_dummy,
        a_tank[tank] ~ dnorm(a, sigma),
        a ~ dnorm(0, 1),
        b_s ~ dnorm(0,1),
        sigma ~ dcauchy(0, 1)
    ), data=d, iter=4000, chains=4)


m12m1.predation.size <- map2stan(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank] + b_s * size_dummy + b_p * predation,
        a_tank[tank] ~ dnorm(a, sigma),
        a ~ dnorm(0, 1),
        b_s ~ dnorm(0, 1),
        b_p ~ dnorm(0, 1),
        sigma ~ dcauchy(0, 1)
    ), data=d, iter=4000, chains=4)


m12m1.predation.size.int <- map2stan(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank] + b_s * size_dummy + b_p * predation + b_p_s * predation * size_dummy,
        a_tank[tank] ~ dnorm(a, sigma),
        a ~ dnorm(0, 1),
        b_s ~ dnorm(0, 1),
        b_p ~ dnorm(0, 1),
        b_p_s ~ dnorm(0, 1),
        sigma ~ dcauchy(0, 1)
    ), data=d, iter=4000, chains=4)

#coeftab(m12M1.predation, m12M1.size, m12M1.both, m12M1.interaction)

# posterior predictive check
plot_posterior_medians <- function(model, data) {
  post <- extract.samples(model)

  # compute median intercept for each tank
  # also transform to probability with logistic

  data$propsurv.est <- logistic( apply(post$a_tank, 2, median))

  # display raw proportions surviving in each tank
  plot( data$propsurv , ylim=c(0,1) , pch=16 , xaxt="n" ,
        xlab="tank" , ylab="proportion survival" , col=rangi2 )
  axis( 1 , at=c(1,16,32,48) , labels=c(1,16,32,48) )

  # overlay posterior medians
  points( data$propsurv.est )

  # mark posterior median probability across tanks
  abline( h=logistic(median(post$a)) , lty=2 )

  # draw vertical dividers between tank densities
  abline( v=16.5 , lwd=0.5 )
  abline( v=32.5 , lwd=0.5 )
  text( 8 , 0 , "small tanks" )
  text( 16+8 , 0 , "medium tanks" )
  text( 32+8 , 0 , "large tanks" )
}

par(mfrow=c(2,2))


plot_posterior_medians(model = m12m1.predation, data = d)
plot_posterior_medians(model = m12m1.size, data = d)
plot_posterior_medians(model = m12m1.predation.size, data = d)
plot_posterior_medians(model = m12m1.predation.size.int, data = d)

The more predictors we add the closer the predictions move towards median. Thus less variance between tanks. Variance is less because of the overfitting of the model. The self-learning regulariation is less impactful.

12M2

compare(m12m1.predation, m12m1.size, m12m1.predation.size, m12m1.predation.size.int)
coeftab(m12m1.predation, m12m1.size, m12m1.prediation.size, m12m1.predation.size.int)

12M3

m12m3.cauchy <- map2stan(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank],
        a_tank[tank] ~ dcauchy(a, sigma),
        a ~ dnorm(0,1),
        sigma ~ dcauchy(0,1)
        ), data=d, iter=4000, chains=4)

m12m3.gaussian <- map2stan(
  alist(
    surv ~ dbinom(density, p) ,
    logit(p) <- a_tank[tank],
    a_tank[tank] ~ dnorm(a, sigma),
    a ~ dnorm(0, 10),
    sigma ~ dcauchy(0, 1)
  ), data=d, iter=4000, chains=4 )
coeftab(m12m3.cauchy, m12m3.gaussian)

post_gaussian <- extract.samples(m12m3.gaussian)
alpha_gaussian <- apply(post_gaussian$a_tank, 2, mean)

post_cauchy <- extract.samples(m12m3.cauchy)
alpha_cauchy <- apply(post_cauchy$a_tank,2,mean)
plot( alpha_gaussian, alpha_cauchy , pch=16 , col=rangi2 ,
      xlab="Gaussian prior" , ylab="Cauchy prior" )
abline(a=0, b=1, lty=2)

The Cauchy model has a long tail, this explains why for extreme values there are moments when shrinkage is less present.

12M4

library(rethinking)
data(chimpanzees)
d <- chimpanzees
d$recipient <- NULL
d$block_id <- d$block


m12.4.chapter <- map2stan(
    alist(
        pulled_left ~ dbinom(1, p),
        logit(p) <- a + a_actor[actor] + a_block[block_id] + (bp + bpc*condition) * prosoc_left,
        a_actor[actor] ~ dnorm(0, sigma_actor),
        a_block[block_id] ~ dnorm(0, sigma_block),
        c(a, bp, bpc) ~ dnorm(0, 10),
        c(sigma_actor, sigma_block) ~ dcauchy(0,1)
    ), data=d, warmup=1000, iter=6000, chains=4, cores=4)

precis(m12.4.chapter, depth=2)


m12.4 <- map2stan(
    alist(
        pulled_left ~ dbinom(1, p),
        logit(p) <- a_actor[actor] + a_block[block_id] + (bp + bpc*condition) * prosoc_left,
        a_actor[actor] ~ dnorm(alpha, sigma_actor),
        a_block[block_id] ~ dnorm(gamma, sigma_block),
        c(alpha, gamma, bp, bpc) ~ dnorm(0, 10),
        c(sigma_actor, sigma_block) ~ dcauchy(0,1)
    ), data=d, warmup=1000, iter=6000, chains=4, cores=4)

precis(m12.4)

compare(m12.4.chapter, m12.4)
par(mfrow=c(1,2))

plot(precis(m12.4, depth=2))
plot(precis(m12.4.chapter, depth=2))

Second model trains very slowly. Rhat is higher for all coefficients. Bad convergence. In chapter model, alpha_actor and alpha_block is relative to alpha. Sigma is learned for two alpha using HalfCauchy.

  1. District: ID number of administrative district each woman resided interaction
  2. use.contraception: an indicator (0/1) of whether the woman was using contraception
  3. urban: an indicator(0/1) of whether the woman lived in a city, as opposed to living in a rural area.

12H1

library(rethinking)
data(bangladesh)
d <- bangladesh

sort(unique(d$district))

d$district_id <- as.integer(as.factor(d$district))
sort(unique(d$district_id))
d$use_contraception <- d$use.contraception
m12h1.fixed <- map2stan(
    alist(
        use_contraception ~ dbinom(1, p),
        logit(p) <- a_district[district_id],
        a_district[district_id] ~ dnorm(0, 10)
        ), data=d, iter=4000, chains=4, cores=4)


m12h1.multi <- map2stan(
    alist(
        use_contraception ~ dbinom(1, p),
        logit(p) <- a_district[district_id],
        a_district[district_id] ~ dnorm(a, sigma),
        a ~ dnorm(0, 10),
        sigma ~ dcauchy(0, 1)
    ), data=d, iter=4000, chains=4, cores=4) 
precis(m12h1.fixed, depth=2)
precis(m12h1.multi, depth=2)
compare(m12h1.fixed, m12h1.multi)

link.fixed <- link(m12h1.fixed, data=data.frame(district_id=1:60))
mu.fixed <- apply(link.fixed, 2, mean)
PI.fixed <- apply(link.fixed, 2, PI)

link.multi <- link(m12h1.multi, data=data.frame(district_id=1:60))
mu.multi <- apply(link.multi, 2, mean)
PI.multi <- apply(link.multi, 2, PI)

library(dplyr)
d_sum <- d %>% as.tbl %>% group_by(district_id) %>% summarize(p_use_contraception=mean(use.contraception))

plot(y=d_sum$p_use_contraception, x=d_sum$district_id, pch=16, xlab="districts", ylab="proportion use_contraception")
abline(h=median(d_sum$p_use_contraception), lty=2)

points(y=mu.fixed, x=1:60, col='red')
points(y=mu.multi, x=(1:60)+0.3, col="green")

legend("topright",legend=c("raw data", 'fixed', 'multi'), fill=c('black', 'red', 'green'))

12H2

library(rethinking)
data(Trolley)
d <- Trolley


m12h2 <- map2stan(
    alist(
        response ~ dordlogit( phi , cutpoints),
        phi <- bA*action + bI*intention + bC*contact,
        c(bA,bI,bC) ~ dnorm(0,10),
        cutpoints ~ dnorm(0, 10)
), 
data=d , iter=4000, chains=4, cores=4, start=list(cutpoints=c(-1.9,-1.2,-0.7,0.2,0.9,1.8)))
d$id_ <- as.integer(d$id)


m12h2.multi <- map2stan(
     alist(
         response ~ dordlogit( phi , cutpoints) ,
         phi <- a_person[id_]+ bA*action + bI*intention + bC*contact,
         c(bA,bI,bC) ~ dnorm(0,10),
         cutpoints ~ dnorm(0,10),
         a_person[id_] ~ dnorm(0, sigma),
         sigma ~ dcauchy(0, 1)
), data=d, iter=4000, chains=4, cores=4, start=list(cutpoints=c(-1.9,-1.2,-0.7,0.2,0.9,1.8)))


compare(m12h2, m12h2.multi)
d.pred <- list(
    action = c(0, 1, 0, 1),
    intention = c(0, 0, 1, 1),
    contact = c(1,0,1,0),
    id_ = rep(2,4))

post <- extract.samples(m12h2.multi)
a_actor_sims <- rnorm(7000, 0, post$sigma)
a_actor_sims <- matrix(a_actor_sims, 1000, 7)

linkm12h2 <- link(m12h2.multi, n=1000, data=d.pred,
                  replace=list(a_person=a_actor_sims))
m12h2.multi <- map2stan(
     alist(
         response ~ dordlogit( phi , cutpoints) ,
         phi <- a_person[id_]+ a_story[story]+ bA*action + bI*intention + bC*contact,
         c(bA,bI,bC) ~ dnorm(0,10),
         cutpoints ~ dnorm(0,10),
         a_person[id_] ~ dnorm(0, sigma_person),
         a_story[story] ~ dnorm(0, sigma_story),
         sigma_person ~ dcauchy(0, 1),
         sigma_story ~ dcauchy(0, 1)
), data=d, iter=4000, chains=4, cores=4, start=list(cutpoints=c(-1.9,-1.2,-0.7,0.2,0.9,1.8)))