-
Notifications
You must be signed in to change notification settings - Fork 6
New appendices on limitations of ANOVA/OLS thinking and Wilkinson-Roger notation #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I like this very much. Indeed, very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea and a good summary - will be very useful. Fixed a coupe of typos and slightly revised a sentence at the end. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palday I think it would be useful to add extensions from RegressionFormulae in a separate section.
|
I am considering whether it is best to try to explain why the methods or formulas from LMs or ANOVA that readers think should be meaningfully extended to LMMs aren't. We will probably end up in a game of whack-a-mole if we try to do this. Would it be better to explain our approach as based on one concept - comparing two nested models fit to the same data. Questions about significance of coefficients or model terms can always be reduced to this and we have a simple approach - fit both models and perform a likelihood ratio test. They didn't learn this approach when they had an intro statistics course because it used to be very hard to fit models using hand calculations so if you could just fit the "alternative hypothesis" model you could avoid fitting another, simpler model through some computational shortcuts. This resulted in statistics being taught as a catechism of formulas and definitions without the underlying motivation being clear. But we don't need computational shortcuts when we have powerful computing hardware and software. In linear models there is a difference between a likelihood ratio test and an F test but that again is computational serendipity allowing you finesse the need to estimate a scale parameter. This doesn't carry over to LMMs but it is not a big deal because the challenges for LMMs are big data, not small data. If you only have 20 observations it may be important to distinguish between an F test and a LR test (or, for a single parameter, between a t and a z test). But you shouldn't be fitting LMMs to 20 observations - the important game is at the 20 million observation level. I am still formulating these thoughts, as you may be able to tell, so feel free to knock down my straw men. |
|
@dmbates I think this is a good call-out -- my motivation for writing this appendix was exactly to avoid the whack-a-mole and instead point out how naive extensions fail. I can integrate your comment into the appendix when I get a few minutes. I think it's the other half of what I was trying to say -- computational shortcuts aren't as necessary today, and many of the various identities were emphasized exactly because they enable the computational shortcuts. The better framing is thinking explicitly about the model comparisons that were implicit in all those classical tests. I was also thinking about degrees of freedom are only important when you have small samples as I was riding my bike today. When William Gosset was developing the t-test, he correspondended with Karl Pearson, and the latter more or less said "well, yes, this works, but it only matters when the data are very small and you should avoid small data anyway". (I have a reference for that somewhere -- happy to resurrect it for this chapter!) |
|
@dmbates Question: If I recall correctly, in a Linear Model (e.g., multiple regression) with correlated predictors the significance of coefficients should also be assessed by testing the change in I also wonder whether (parts of) your comment would be placed better in the context of model selection. With many observations, the data usually support complex LMMs with many coefficients both with respect to fixed effects and the size of the random-effect structure. I usually do not test each single coefficient with a LRT, but theoretically motivated sets of coefficients (which could be a single coefficient). I do this (1) during model selection in the random effect structure and (2) in a top-down manner to determine the highest order of interactions between categorical/indicator and continuous covariates. As far as the latter is concerned, with this procedure I do not recall ending up with differences between "drop-one" LRTs and, say, the bootstrapped confidence intervals of the coefficients. Of course, one also needs to decide what constitutes a significant LRT change (Chi-square, AIC, BIC) which in my experience requires a distinction between whether you are operating in an exploratory or theory-testing context and how serious you are about p < .05 or p <.20 or ... Finally, in the special case of factorial ANOVA designs with a suitable specification of contrasts, even allowing for some imbalance between design cells, I don't know an example where "significances" based on bootstrapped coefficient CIs and LRTs lead to different conclusions, granting some minute borderline differences at the alpha level. Do you know of an example? Actually, for the ANOVA case, I usually do not carry out a model selection in the fixed effects, but report all them. |
|
This is an interesting paper, I think: Glover & Dixon (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11(5), 791–806 . I subscribe to their conclusion (p 806): "We have shown here how likelihood ratios are derived, how they may be computed and interpreted, and how they can be used to fulfill the most common purposes of reporting statistics in empirical psychology. The critical ingredient in this approach is the incorporation of theoretical and conceptual knowledge in the reporting of evidence. We emphasize that the appropriate analysis of data cannot be described in the abstract and that there is no mechanical or “cook-book” method for dealing with |
I'm not happy with the "identities and concepts from classical stats" appendix yet, but I think it might still be useful in its current form.
I'm happy for any and all feedback, but would like to merge it sooner rather than later so that we can make it available to a broader audience. We can then iterate as much as need be. (One of the advantages of an online book. 😄 )