-
Notifications
You must be signed in to change notification settings - Fork 5
Structured covariance matrices for lme4
Mixed models, also known as mixed-effects models or hierarchical models, are a class of statistical model that incorporates fixed and random effects. These models are widely used where data are correlated or structured, e.g. repeated measures, longitudinal, clustered or hierarchical data, which are common across numerous disciplines. With the exception of the recommended package nlme (which predates R itself), lme4 is the most widely used package for mixed models; it provides several capabilities (generalized models, crossed random effects) that are either impossible or difficult to achieve with nlme.
However, unlike nlme, lme4 does not allow for structured covariance matrices in random-effects terms (e.g. compound-symmetric, autoregressive, etc.); such matrices are useful both as a way to simplify overly complex models, and to incorporate known structure (e.g. temporal, spatial, or phylogenetic) when estimating random effects.
This project aims to add these capabilities to lme4.
- as mentioned above,
nlmeprovides access to some structured covariance matrices (and other packages, such asape, extend these capabilities). Although the architecture ofnlmeandlme4are completely different, we can use the stuff fromnlmeboth as test/comparison cases, and as inspiration (e.g., the value of a modular/extensible/object-oriented framework for structured covariance matrices). - such a project was attempted before (see this branch), but was abandoned 11 years ago for a variety of reasons. We can re-use some of this material but will also take lessons from some of the reasons it failed (e.g., premature branching failed to maintain back-compatibility with "base"
lme4) - The
glmmTMBpackage supports a variety of structured covariance matrices (although not in a modular way). We will be able to re-use some of the formula-processing machinery (implemented in thereformulaspackage). - The Bayesian/sampling-based
MCMCglmmand (to some extent)brmsalso provide covariance structures. These will be useful mostly for inspiration, since the internal architectures are very different. - The INLA package provides tools for constructing efficient, spatially structured precision matrices (the inverse of the covariance matrix); it would be nice to be able to take advantage of it, but may not be feasible because the architecture of
lme4is heavily based on Cholesky factors (a matrix square root of the covariance matrix) rather than precision matrices (or Cholesky factors of precision matrices).
We will start by reviewing the flexLambda design ...
- minimal goal: back-compatible modular structure, documentation and tests, implementation of compound symmetric and diagonal structures
- document and test (test-driven??)
- maintain strict back-compatibility/test passage
- compare against
nlme,glmmTMBresults
- EVALUATING MENTOR: Ben Bolker [email protected] is the maintainer of
lme4and an active contributor toglmmTMB. He has previous experience with GSOC (phylobase(as mentor, 2008),directlabels(as co-/secondary mentor, 2021)) - Emi Tanaka [email protected] is the author of the
edibbleandnestrCRAN packages and co-maintainer of the Mixed Models task view - Mikael Jagan [email protected] is an author of the recommended
Matrixpackage and maintainer of several other CRAN packages, and is an active contributor of detailed bug reports and patches for base R.
Contributors, please contact mentors below after completing at least one of the tests below.
Contributors, please do one or more of the following tests before contacting the mentors above.
MENTORS: write several tests that potential contributors can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the contributors write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the contributors that apply for your project! Please modify the suggestions below to make them specific for your project.
-
Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
- Write a shell script to be run in the top level directory of the lme4
sources (containing file
DESCRIPTION). The script should ensure that all dependencies of lme4 are installed in the user library, then build a source tarball, then perform a check on the source tarball. Run the script and save the installation and check output from the check directory. If the output suggests problems with your set-up, then fix the problems and try again.
- Write a shell script to be run in the top level directory of the lme4
sources (containing file
-
Medium: something a bit more complicated. You can encourage contributors to write a script or some functions that show their R coding abilities.
-
Use the modular framework described in
?lme4::modularto write two R functions implementing (some kind of) structured covariance matrices, first by writing a wrapper for the objective function and then by hacking thereTrmslist object.MJ: Is the task to implement two covariance matrix structures or to write two functions implementing one covariance matrix structure ... ?
-
-
Hard: Can the contributor write a package with Rd files, tests, and vignettes? If your package interfaces with non-R code, can the contributor write in that other language?
-
Submit a pull request fixing one of the issues in the GitHub repository. An ideal patch will make minimal necessary changes to the R sources, improve the documentation, and add suitable regression tests.
MJ: We should provide a list of suitable issues.
-
Contributors, please post a link to your test results here.
- EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.