Skip to content

Commit 7e1e867

Browse files
Eilis HannonEilis Hannon
authored andcommitted
draft of mixed effects model complete
1 parent 8d136d9 commit 7e1e867

File tree

1 file changed

+188
-16
lines changed

1 file changed

+188
-16
lines changed

inst/tutorials/Advanced Regression Analysis/Advanced Regression Analysis.Rmd

Lines changed: 188 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ age<-floor(runif(nInd, 20, 60))
2222
intervention<-sample(c("Placebo", "Training"), nInd, replace = TRUE)
2323
yearsEd<-sample(c(12,14,17), nInd, replace = TRUE, prob = c(0.3,0.4, 0.3))
2424
smoke <- sample(c("Yes", "No"), nInd, replace = TRUE, prob = c(0.25,0.75))
25+
physicalWellbeing <- sample(c("High", "Low"), nInd, replace = TRUE, prob = c(0.85,0.15))
26+
mentalWellbeing <- sample(c("High", "Low"), nInd, replace = TRUE, prob = c(0.7,0.3))
27+
cogAbaseline <- cogAbaseline[(physicalWellbeing == "Low" | mentalWellbeing == "Low")]<- rpois(sum((physicalWellbeing == "Low" | mentalWellbeing == "Low")), 22)
2528
cogAbaseline <- cogAbaseline[smoke == "Yes"]<- rpois(sum(smoke == "Yes"), 23)
2629
2730
visitID<-as.factor(rep(indIDs, nVisits))
@@ -33,14 +36,21 @@ visitAge<-age[index]+visitNum
3336
visitIntervention<-as.factor(intervention[index])
3437
visitYearsEd <- yearsEd[index]
3538
visitSmoke <- as.factor(smoke[index])
36-
37-
cogA<- floor(cogAbaseline[index] + visitNum * (0.2 + 0.05 * as.numeric(visitIntervention)) + rnorm(length(visitNum), 0,2))
39+
visitPW <- as.factor(physicalWellbeing[index])
40+
randomIndex<-sample(1:length(index), nInd)
41+
visitPW[randomIndex]<-"Low"
42+
visitMW <-as.factor(mentalWellbeing[index])
43+
randomIndex<-sample(which(visitNum > 3), nInd*0.5)
44+
visitMW[randomIndex]<-"High"
45+
46+
cogA<- floor(cogAbaseline[index] + visitNum * (0.2 + 0.05 * as.numeric(visitIntervention) + 0.04 * as.numeric(visitMW)) + rnorm(length(visitNum), 0,2))
3847
3948
cogB<-cogBbaseline[index] + visitNum * (0.1 + 0.03 * as.numeric(visitSex) + 0.05 * (visitYearsEd-12)) + rnorm(length(visitNum), 0, 2)
4049
4150
cogC<-cogCbaseline[index] + visitNum * (0.01 + 0.003 * as.numeric(visitSex) + 0.001 * as.numeric(visitIntervention)) + rnorm(length(visitNum), 0, 5)
4251
43-
cogDat<-data.frame("ID" = visitID, "VisitNum" = visitNum, "Age" = visitAge, "Sex" = visitSex, "YearsEducation" = visitYearsEd, "Smoker" = visitSmoke, "Intervention" = visitIntervention, "CognitionA" = cogA, "CognitionB" = cogB, "CognitionC" = cogC)
52+
53+
cogDat<-data.frame("ID" = visitID, "VisitNum" = visitNum, "Age" = visitAge, "Sex" = visitSex, "YearsEducation" = visitYearsEd, "Smoker" = visitSmoke, "Intervention" = visitIntervention, "CognitionA" = cogA, "CognitionB" = cogB, "CognitionC" = cogC, "PhysicalWellbeing" = visitPW, "MentalWellbeing" = visitMW)
4454
4555
```
4656

@@ -579,7 +589,7 @@ This test returns a p-value > 0.05, indicating that the data are consistent with
579589
### Exercise 2
580590

581591

582-
*Let's see if the other cognitive tests also change consistently over time*
592+
*Let's try fitting some random slopes models.*
583593

584594
Write the R code required,to test using a mixed effects regression model, the following:
585595

@@ -689,24 +699,32 @@ Random slopes model have all the same assumptions as random intercepts model plu
689699

690700
If our results did suggest that the random slopes model had some value, we could repeat the diagnostic plots from before to check our model assumptions; this time thought we would need to add a fourth plot to check the residuals of the random slope term we estimate for each individual.
691701

692-
```{r}
702+
```{r, fig.height = 6}
693703
# a plot to check the constant standard deviation
694704
plot(fitted(model.rand.slope),resid(model.rand.slope,type="pearson"),col="blue", xlab = "fitted", ylab = "residuals")
695705
abline(h=0,lwd=2)
706+
```
696707

708+
```{r, fig.height = 6}
697709
# normality of the residuals
698710
qqnorm(resid(model.rand.slope))
699711
qqline(resid(model.rand.slope))
712+
```
700713

714+
```{r, fig.height = 6}
701715
# normality of the random intercept estimates
702716
qqnorm(ranef(model.rand.slope)$ID[,1])
703717
qqline(ranef(model.rand.slope)$ID[,1])
718+
```
704719

720+
```{r, fig.height = 6}
705721
# normality of the random slope estimates
706722
qqnorm(ranef(model.rand.slope)$ID[,2])
707723
qqline(ranef(model.rand.slope)$ID[,2])
708724
```
709725

726+
As with the random intercepts model these look pretty reasonable and no reason to believe the model is biased.
727+
710728
### Some notes on model formulation
711729

712730
Once we start incorporating random slopes the interpretation of some predictor variables can get quite complicated. Some things to consider when deciding what model to fit:
@@ -766,37 +784,191 @@ summary(model.sex)
766784

767785
We can see that the fixed effect we have estimated for sex is not significant.
768786

787+
788+
769789
### Logistic mixed effects regression models
770790

771791
If our outcome is a binary variable we alternatively need to fit a logistic regression model. For an explanation as to why, please see the **Introduction to Regression Analysis** tutorial. As logistic regression requires a generalized linear model framework, we need to use the function `glmer()` rather than `lmer()`.
772792

773-
Let's look to see if there is a difference in cognitive performance between individuals who currently smoke.
793+
Let's look to see if in general the participants mental well being improves as the study progresses. The variable that captures change over the course of the study is `VisitNum` so this is our predictor variable, we keep our random effect for `ID` and our outcome variable is the factor `MentalWellbeing`
774794

775795
```{r}
776-
model.smoke <- glmer(Smoker ~ CognitionA + (1 | VisitNum), data = cogDat, family="binomial")
777-
summary(model.smoke)
796+
model.log <- glmer(MentalWellbeing ~ VisitNum + (1 | ID), data = cogDat, family="binomial")
797+
summary(model.log)
798+
778799
```
779800

801+
Let's see if we need the random intercept which is essentially asking the question whether an individuals well being at one point in the study predicts their well being at another stage. It is important to do this for each model, because just because individual has an affect on one variable, doesn't automatically mean it affects all variables in a data set.
780802

781-
### Significance testing of fixed effects with anova
803+
As before we do this my comparing it to a standard regression model with just fixed effects and no random effects. As the standard model also needs to be a logistic regression model we use the `glm()` function to fit it. We then use an `anova()` to compare the models with and without the random effects.
782804

783-
As with linear regression we can use `anova()` to compare the joint effect of fixed effects. Note that the random effects must be identical and the fixed effects must be nested (i.e. one is a subset of the other). This can only be done if we used the maximum likelhood method (set by including the argument `REML = FALSE`), however if the model was intially fitted with `REML = TRUE`, R will first refit the model with `REML = FALSE` and then perform the anova. Here we will compare our random intercepts model with and without a fixed effect for sex
805+
```{r}
806+
null.log <- glm(MentalWellbeing ~ VisitNum, data = cogDat, family="binomial")
807+
anova(model.log, null.log)
808+
```
809+
810+
These results show that the inclusion of the random intercept does significantly improve the fit of the model as P < 0.05. Therefore we can conclude that individual's mental well being is correlated across the course of the study.
811+
812+
We interpret the fixed effects as we would for any other logistic regression model - they relate to the log odds ratio of the outcome per one unit increase in the predictor variable. As a one unit increase in the predictor variable equates to one extra visit, we can summarise from this model that each extra visit is associated with a log odd ratio of `r signif(summary(model.log)$coefficients["VisitNum", "Estimate"],2)`. We can convert this to an odds ratio by raising it to an exponential.
784813

785814
```{r}
786-
model.rand.int.null<-lmer(CognitionA ~ visitNum + (1 | ID), data = cogDat)
815+
exp(coef(summary(model.log))[,"Estimate"])
816+
```
817+
818+
So the odds of having low mental well being relative to high mental well being decreases by a factor of `r signif(exp(coef(summary(model.log))["VisitNum","Estimate"]),2)` for each extra visit. We can flip this round and say that each visit increases the odds of having high mental well being by a factor of `r signif(1/exp(coef(summary(model.log))["VisitNum","Estimate"]),2)`. Note that the individual level intercepts represent each individuals baseline odds ratio for their mental well being.
819+
820+
### Exercise 3
821+
822+
823+
*Let's practise fitting more complex mixed effects models*
824+
825+
Write the R code required to test using a mixed effects regression model the following. For eachmodel include a random intercept for individual.
826+
827+
1. Is cognitive performance measured by any of the three tests influenced by smoking or years of education?
828+
829+
830+
```{r exercise3a, exercise=TRUE}
831+
832+
833+
model.coga<-lmer(CognitionA ~ VisitNum + ... + (1|ID), data = cogDat)
834+
model.cogb<-
835+
model.cogc<-
836+
787837
788-
anova(model.rand.int, model.rand.int.null)
789838
790839
```
791840

792-
We can see the p-value is \> 0.05 then we would conclude that sex does not significantly improve the model inline with the t-test of the fixed effect coefficient.
841+
```{r exercise3a-solution}
793842
794-
For more information on reasons why a model might not converge we can look at the documentation for the lmer package.
843+
model.coga<-lmer(CognitionA ~ VisitNum + Smoker + YearsEducation + (1|ID), data = cogDat)
844+
summary(model.coga)
795845
796-
```{r, eval = FALSE}
797-
?convergence
846+
model.cogb<-lmer(CognitionB ~ VisitNum + Smoker + YearsEducation + (1|ID), data = cogDat)
847+
summary(model.cogb)
848+
849+
model.cogc<-lmer(CognitionC ~ VisitNum + Smoker + YearsEducation + (1|ID), data = cogDat)
850+
summary(model.cogc)
851+
852+
```
853+
854+
```{r quiz4, echo=FALSE}
855+
quiz(caption = "Questions on the exercise above",
856+
question("Smoking behaviour is significantly associated (P < 0.05) with which cognitive tests? Select all that apply",
857+
answer("Cognition A"),
858+
answer("Cognition B"),
859+
answer("Cognition C"),
860+
answer("None", correct = TRUE),
861+
allow_retry = TRUE),
862+
question("Years of education is significantly associated (P < 0.05) with which cognitive tests? Select all that apply",
863+
answer("Cognition A"),
864+
answer("Cognition B"),
865+
answer("Cognition C", correct = TRUE),
866+
answer("None"),
867+
allow_retry = TRUE),
868+
question("Considering the results for cognitive test C, what is the value of coefficient for years of education?",
869+
answer("15.5"),
870+
answer("0.032"),
871+
answer("0.089"),
872+
answer("0.31", correct = TRUE),
873+
allow_retry = TRUE),
874+
question("What is the correct interpretation of the value of coefficient for years of education?",
875+
answer("It is the mean cognitive score for those with 0 years of education."),
876+
answer("It is the mean cognitive score for those with at least 1 year of education."),
877+
answer("It is the mean change in cognitive score per year of education.", correct = TRUE),
878+
answer("It is the mean change in cognitive score per 12 years of education."),
879+
allow_retry = TRUE)
880+
)
881+
```
882+
883+
884+
885+
2. Does cognitive performance in any of the three tests influence the mental well being of the participants? Include co-variates for sex and years of education.
886+
887+
888+
```{r exercise3b, exercise=TRUE}
889+
890+
891+
model.mw.coga<-glmer(MentalWellebing ~ CognitionA + VisitNum + ... + (1|ID), data = cogDat)
892+
model.mw.cogb<-
893+
model.mw.cogc<-
894+
895+
896+
897+
```
898+
899+
```{r exercise3b-solution}
900+
901+
model.mw.coga<-glmer(MentalWellbeing ~ CognitionA + VisitNum + Sex + YearsEducation + (1|ID), data = cogDat, family = "binomial")
902+
summary(model.mw.coga)
903+
904+
model.mw.cogb<-glmer(MentalWellbeing ~ CognitionB + VisitNum + Sex + YearsEducation + (1|ID), data = cogDat, family = "binomial")
905+
summary(model.mw.cogb)
906+
907+
model.mw.cogc<-glmer(MentalWellbeing ~ CognitionC + VisitNum + Sex + YearsEducation + (1|ID), data = cogDat, family = "binomial")
908+
summary(model.mw.cogc)
909+
910+
## alternatively we could test simultaneously but fails to converge
911+
912+
model.mw.cogall<-glmer(MentalWellbeing ~ CognitionA + CognitionB + CognitionC + VisitNum + Sex + YearsEducation + (1|ID), data = cogDat, family = "binomial")
913+
summary(model.mw.cogall)
914+
915+
```
916+
917+
```{r quiz5, echo=FALSE}
918+
quiz(caption = "Questions on the exercise above",
919+
question("Which cognitive tests significantly influence mental well being? Select all that apply",
920+
answer("Cognition A", correct = TRUE),
921+
answer("Cognition B"),
922+
answer("Cognition C"),
923+
answer("None"),
924+
allow_retry = TRUE),
925+
question("Which cognitive tests are associated with increasing mental well being from low to high? You can ignore whether they are significant or not. Select all that apply.",
926+
answer("Cognition A"),
927+
answer("Cognition B"),
928+
answer("Cognition C"),
929+
answer("None", correct = TRUE),
930+
allow_retry = TRUE),
931+
question("What is the intepretation of the coefficient for each cognition test?",
932+
answer("It represents the change in cognitive score needed to go from low mental well being to high mental well being."),
933+
answer("It represents the change in cognitive score needed to go from high mental well being to low mental well being."),
934+
answer("It represents the log odds ratio of change in mental wellbeing from low to high per one point on the cognitive test.", correct = TRUE),
935+
answer("It represents the log odds ratio of change in mental wellbeing from high to low per one point on the cognitive test."),
936+
allow_retry = TRUE)
937+
)
798938
```
799939

940+
941+
3. Does physical well being improve over the course of the study?
942+
943+
944+
```{r exercise3c, exercise=TRUE}
945+
946+
947+
948+
949+
```
950+
951+
```{r exercise3c-solution}
952+
953+
model.pw<-glmer(PhysicalWellbeing ~ VisitNum +(1|ID), data = cogDat, family = "binomial")
954+
summary(model.pw)
955+
956+
```
957+
958+
```{r quiz6, echo=FALSE}
959+
quiz(caption = "Questions on the exercise above",
960+
question("Does physical well being improve over the course of the study?",
961+
answer("Yes but not significantly."),
962+
answer("Yes there is a significant increase in physical wellbeing."),
963+
answer("No but not significantly.", correct = TRUE),
964+
answer("No there is a significant decrease in physical wellbeing."),
965+
allow_retry = TRUE)
966+
)
967+
```
968+
969+
970+
971+
800972
## Regression models with interaction terms
801973

802974
We are going to look at how to code an interaction term in R by extending the multiple linear regression model we fitted in the previous workshop (Contact Day 3). If you recall, we fitted a model to see whether age and sex predict cognitive performance as measured by the general cognitive factor. Here we will add an interaction term between age and sex. To do this we need to add the interaction term to our formula.

0 commit comments

Comments
 (0)