using Chairmarks # to benchmark function executions
+using DataFrames
+using MixedModels
+using PooledArrays # similar to factor in R
+using Random # random number generators
+using RCall # call R from JuliaNote on mixed model calculations
+1 Introduction
+This note is to reproduce the simulated two-factor anova model in Zhou et al. (2019) and to fit it using the MixedModels.jl package. I would have included comparative results from the VarianceComponentModels.jl package but I was unable to install it, even on the long-term-support (lts) version of Julia. (Repairs may be as simple as merging the CompatHelper PR but I didn’t check.)
+The model incorporates two random effects factors and their interaction. In the simulation the number of levels of the grouping factors for the random effects is set at 5 for a and b so that the number of levels for the interaction term, ab, is 25. The number of replicates at each level of the interaction is the parameter c.
We note that only having five levels each of the grouping factors \(\mathbf{a}\) and \(\mathbf{b}\) may result in an unstable estimation situation, because it is difficult to estimate a variance from only five distinct pieces of information. In such cases, as we will see below, the estimates of variance components can converge to zero.
+Convergence to zero for a variance component is not a problem when using MixedModels.jl because it was designed with this possibility in mind.
+2 Setting up the simulation
+We create a DataFrame corresponding to the experimental design with a placeholder response vector then create a LinearMixedModel without fitting it then use the simulate! function in the MixedModels package to simulate the response given values of the parameter.
Load the packages to be used
+Create a function to generate a DataFrame with a numeric response, y, and PooledArrays, a, b, and ab.
"""
+ design(c; a=5, b=5)
+
+Return a DataFrame with columns y, a, b, and ab in a two-factor design with interaction
+"""
+function design(c::Integer; a::Integer=5, b::Integer=5)
+ signed, compress = true, true # used as named arguments to PooledArray()
+ av = repeat(string.('a':'z')[1:a], inner=b * c)
+ bv = repeat(string.('A':'Z')[1:b], inner=c, outer=b)
+ return DataFrame(
+ y = zeros(a * b * c),
+ a = PooledArray(av; signed, compress),
+ b = PooledArray(bv; signed, compress),
+ ab = PooledArray(av .* bv; signed, compress),
+ )
+endFor c = 5 the data frame is
d05 = design(5)| Row | +y | +a | +b | +ab | +
|---|---|---|---|---|
| + | Float64 | +String | +String | +String | +
| 1 | +0.0 | +a | +A | +aA | +
| 2 | +0.0 | +a | +A | +aA | +
| 3 | +0.0 | +a | +A | +aA | +
| 4 | +0.0 | +a | +A | +aA | +
| 5 | +0.0 | +a | +A | +aA | +
| 6 | +0.0 | +a | +B | +aB | +
| 7 | +0.0 | +a | +B | +aB | +
| 8 | +0.0 | +a | +B | +aB | +
| 9 | +0.0 | +a | +B | +aB | +
| 10 | +0.0 | +a | +B | +aB | +
| 11 | +0.0 | +a | +C | +aC | +
| 12 | +0.0 | +a | +C | +aC | +
| 13 | +0.0 | +a | +C | +aC | +
| ⋮ | +⋮ | +⋮ | +⋮ | +⋮ | +
| 114 | +0.0 | +e | +C | +eC | +
| 115 | +0.0 | +e | +C | +eC | +
| 116 | +0.0 | +e | +D | +eD | +
| 117 | +0.0 | +e | +D | +eD | +
| 118 | +0.0 | +e | +D | +eD | +
| 119 | +0.0 | +e | +D | +eD | +
| 120 | +0.0 | +e | +D | +eD | +
| 121 | +0.0 | +e | +E | +eE | +
| 122 | +0.0 | +e | +E | +eE | +
| 123 | +0.0 | +e | +E | +eE | +
| 124 | +0.0 | +e | +E | +eE | +
| 125 | +0.0 | +e | +E | +eE | +
3 Simulating a single instance
+We define a formula, form, that provides for scalar fixed-effects and scalar random effects for a, b, and ab. Because this binding is declared to be const we avoid declaring it more than once.
if !@isdefined(form)
+ const form = @formula(y ~ 1 + (1 | a) + (1 | b) + (1 | ab))
+end
+m05 = LinearMixedModel(form, d05)The simulate! function modifies this model by overwriting the place-holder response vector with the simulated response, in the model object only, not in the original data frame, d05. The parameters are set, as in Zhou et al. (2019), to \(\mathbf{\beta}=[1]\), \(\sigma=1\) (written \(\sigma_e\) in Zhou et al. (2019)) and values of \(\mathbf{\theta}=\left[\sigma_1/\sigma, \sigma_2/\sigma, \sigma_3/\sigma\right]\) corresponding to the (constant)ratios \(\sigma^2_i/\sigma^2_e, i=1,2,3\) in Zhou et al. (2019).
The first set of \(\mathbf{\theta}\) values is zeros, as in Zhou et al. (2019).
+rng = Xoshiro(6345789) # initialize a random number generator
+print(fit!(simulate!(rng, m05; β=[1.0], σ=1.0, θ=zeros(3))))Linear mixed model fit by maximum likelihood
+ y ~ 1 + (1 | a) + (1 | b) + (1 | ab)
+ logLik -2 logLik AIC AICc BIC
+ -159.6381 319.2763 329.2763 329.7805 343.4178
+
+Variance components:
+ Column Variance Std.Dev.
+ab (Intercept) 0.0000000 0.0000000
+a (Intercept) 0.0000000 0.0000000
+b (Intercept) 0.0042240 0.0649925
+Residual 0.7490554 0.8654799
+ Number of obs: 125; levels of grouping factors: 25, 5, 5
+
+ Fixed-effects parameters:
+─────────────────────────────────────────────────
+ Coef. Std. Error z Pr(>|z|)
+─────────────────────────────────────────────────
+(Intercept) 1.02392 0.0826877 12.38 <1e-34
+─────────────────────────────────────────────────
+To be able to reproduce this fit we copy the simulated response vector, m05.y, into d05.y.
copyto!(d05.y, m05.y)It is not surprising that estimates of two of the three components of \(\mathbf{\theta}\) are zero, as the response was simulated from \(\mathbf{\theta}=[0,0,0]\).
+The optsum property of the fitted model, m05, contains information on the progress of the iterative optimization to obtain the mle’s.
m05.optsum| + | + |
|---|---|
| Initialization | ++ |
| Initial parameter vector | +[1.0, 1.0, 1.0] | +
| Initial objective value | +360.034021929135 | +
| Optimizer settings | ++ |
| Optimizer | +LN_NEWUOA |
+
| Backend | +nlopt |
+
| ftol_rel | +1.0e-12 | +
| ftol_abs | +1.0e-8 | +
| xtol_rel | +0.0 | +
| xtol_abs | +[1.0e-10, 1.0e-10, 1.0e-10] | +
| initial_step | +[1.0, 1.0, 1.0] | +
| maxfeval | +-1 | +
| maxtime | +-1.0 | +
| xtol_zero_abs | +0.001 | +
| ftol_zero_abs | +1.0e-5 | +
| Result | ++ |
| Function evaluations | +55 | +
| Final parameter vector | +[0.0, 0.0, 0.0751] | +
| Final objective value | +319.2763 | +
| Return code | +FTOL_REACHED |
+
In this case it took 55 function evaluations to declare convergence but that number will depend on the convergence criteria set. We use rather stringent criteria. In particular, ftol_rel, which appears to be the criterion used in Zhou et al. (2019), is, by default, set to \(10^{-12}\) in MixedModels.jl, as compared to \(10^{-8}\) in Zhou et al. (2019).
The progress of the iterations is recorded in the fitlog table in the optsum property.
m05.optsum.fitlogTable with 2 columns and 56 rows:
+ θ objective
+ ┌─────────────────────────────────────────────
+ 1 │ [1.0, 1.0, 1.0] 360.034
+ 2 │ [2.0, 1.0, 1.0] 381.952
+ 3 │ [1.0, 2.0, 1.0] 365.686
+ 4 │ [1.0, 1.0, 2.0] 365.65
+ 5 │ [0.0, 1.0, 1.0] 339.29
+ 6 │ [1.0, 0.0, 1.0] 353.555
+ 7 │ [1.0, 1.0, 0.0] 353.761
+ 8 │ [-0.921548, 0.722701, 0.728237] 353.677
+ 9 │ [-0.234102, 1.31609, 1.30868] 345.449
+ 10 │ [0.0453179, 1.10931, 0.959718] 339.89
+ 11 │ [0.0269061, 0.799954, 0.852495] 336.199
+ 12 │ [0.0193397, 0.476201, 0.471541] 328.142
+ 13 │ [0.0791365, -0.186749, -0.274731] 321.755
+ 14 │ [0.2224, -1.14528, -0.0283809] 333.818
+ 15 │ [0.0891207, -0.240223, -0.445285] 324.799
+ 16 │ [-0.145542, 0.25668, -0.220975] 322.579
+ 17 │ [0.192, -0.176673, -0.221955] 322.201
+ ⋮ │ ⋮ ⋮
+If we look at the last 15 values of the objective we can see that the optimizer would have been declared to have converged in 15 fewer function evaluations if ftol_rel was set to \(10^{-8}\).
last(m05.optsum.fitlog.objective, 15)15-element Vector{Float64}:
+ 319.27627702480993
+ 319.2762768841083
+ 319.27627666270564
+ 319.27627693140636
+ 319.27627701023084
+ 319.2762767867746
+ 319.27627663951205
+ 319.276276642132
+ 319.2762766425802
+ 319.27627664037254
+ 319.2762766418587
+ 319.27627664207375
+ 319.27627664178124
+ 319.2762766393934
+ 319.2762766393925
+When comparing the number of iterations between algorithms, bear in mind that the optimizer used here reports the number of evaluations of the objective. Other algorithms may count iterations that involve more than one evaluation of the objective or may involve gradient evaluations.
+4 Benchmarking
+For this model, and for all the other models in the simulation, the blocked Cholesky factor, \(\mathbf{L}\) to be updated for each evaluation of the profiled log-likelihood has the structure
+BlockDescription(m05)| rows | +ab | +a | +b | +fixed | +
|---|---|---|---|---|
| 25 | +Diagonal | ++ | + | + |
| 5 | +Dense | +Diagonal | ++ | + |
| 5 | +Dense | +Dense | +Diag/Dense | ++ |
| 2 | +Dense | +Dense | +Dense | +Dense | +
That is, regardless of the value of c in the simulation, the updates for evaluating the profiled log-likelihood are for a blocked lower Cholesky factor of size \(37\times 37\) of which the upper left \(25\times 25\) block is diagonal and trivial to update.
As a sparse matrix this would have the form
+sparseL(m05; full=true)37×37 SparseArrays.SparseMatrixCSC{Float64, Int32} with 48 stored entries:
+⎡⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
+⎢⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣑⣄⠀⎥
+⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠁⎦
+but it is stored as a blocked triangular matrix. The convergence to \(\theta_1=0, \theta_2=0\) makes this matrix appear to be more sparse than it, in fact, is. Generally there would be non-zero values in the last 12 rows and first 30 columns.
+The update operation is very fast
+@b m05 objective(updateL!(_))7.792 μs (27 allocs: 528 bytes)
+as is the fitting procedure from start to finish
+@b fit(MixedModel, $form, $d05)674.708 μs (3540 allocs: 180.859 KiB)
+and the m05 object is quite small
Base.summarysize(m05)32991
+We now fit the same model to this response with the lme4 package in R.
First, transfer the data frame and the formula to R
+@rput d05
+@rput formthen fit the model
+R"m05 <- lme4::lmer(form, d05, REML=FALSE, control=lme4::lmerControl(calc.derivs=FALSE))"
+R"summary(m05)"RObject{VecSxp}
+Linear mixed model fit by maximum likelihood ['lmerMod']
+Formula: y ~ 1 + (1 | a) + (1 | b) + (1 | ab)
+ Data: d05
+Control: lme4::lmerControl(calc.derivs = FALSE)
+
+ AIC BIC logLik -2*log(L) df.resid
+ 329.3 343.4 -159.6 319.3 120
+
+Scaled residuals:
+ Min 1Q Median 3Q Max
+-2.9788 -0.6642 -0.0621 0.6158 2.3267
+
+Random effects:
+ Groups Name Variance Std.Dev.
+ ab (Intercept) 0.000e+00 0.000e+00
+ b (Intercept) 4.226e-03 6.501e-02
+ a (Intercept) 2.034e-10 1.426e-05
+ Residual 7.491e-01 8.655e-01
+Number of obs: 125, groups: ab, 25; b, 5; a, 5
+
+Fixed effects:
+ Estimate Std. Error t value
+(Intercept) 1.02392 0.08269 12.38
+optimizer (nloptwrap) convergence code: 0 (OK)
+boundary (singular) fit: see help('isSingular')
+
+Notice that the convergence is to a slightly different parameter vector but a similar deviance. The main difference in the converged parameter estimates is in \(\sigma_2\), which is \(1.426\times10^{-5}\) here and zero in the MixedModels fit. But random effects with a standard deviation that small are negligible.
+deviance(m05)319.2762766393925
+Fitting this model in R using lme4::lmer is much slower than using MixedModels.jl.
@b R"lme4::lmer(form, d05, REML=FALSE, lme4::lmerControl(calc.derivs=FALSE))"9.063 ms (51 allocs: 1.562 KiB)
+4.1 Data simulated from non-zero \(\theta\) values
+fit!(simulate!(rng, m05; β=[1.], σ=1., θ=ones(3)))| + | Est. | +SE | +z | +p | +σ_ab | +σ_a | +σ_b | +
|---|---|---|---|---|---|---|---|
| (Intercept) | +-0.0111 | +1.0528 | +-0.01 | +0.9916 | +0.8139 | +1.3606 | +1.8764 | +
| Residual | +0.9733 | ++ | + | + | + | + | + |
m05.optsum| + | + |
|---|---|
| Initialization | ++ |
| Initial parameter vector | +[1.0, 1.0, 1.0] | +
| Initial objective value | +416.9623089585963 | +
| Optimizer settings | ++ |
| Optimizer | +LN_NEWUOA |
+
| Backend | +nlopt |
+
| ftol_rel | +1.0e-12 | +
| ftol_abs | +1.0e-8 | +
| xtol_rel | +0.0 | +
| xtol_abs | +[1.0e-10, 1.0e-10, 1.0e-10] | +
| initial_step | +[1.0, 1.0, 1.0] | +
| maxfeval | +-1 | +
| maxtime | +-1.0 | +
| xtol_zero_abs | +0.001 | +
| ftol_zero_abs | +1.0e-5 | +
| Result | ++ |
| Function evaluations | +50 | +
| Final parameter vector | +[0.8362, 1.3978, 1.9278] | +
| Final objective value | +411.236 | +
| Return code | +FTOL_REACHED |
+
5 Larger data sets
+Increasing c, the number of replicates at each level of ab does not substantially increase the size of the model
m50 = fit!(
+ simulate!(rng, LinearMixedModel(form, design(50)); β=ones(1), σ=1.0, θ=ones(3))
+)
+print(m50)Linear mixed model fit by maximum likelihood
+ y ~ 1 + (1 | a) + (1 | b) + (1 | ab)
+ logLik -2 logLik AIC AICc BIC
+ -1812.7803 3625.5606 3635.5606 3635.6088 3661.2151
+
+Variance components:
+ Column Variance Std.Dev.
+ab (Intercept) 1.048022 1.023729
+a (Intercept) 0.017665 0.132909
+b (Intercept) 0.367207 0.605976
+Residual 0.978542 0.989213
+ Number of obs: 1250; levels of grouping factors: 25, 5, 5
+
+ Fixed-effects parameters:
+────────────────────────────────────────────────
+ Coef. Std. Error z Pr(>|z|)
+────────────────────────────────────────────────
+(Intercept) 2.21824 0.345945 6.41 <1e-09
+────────────────────────────────────────────────
+Base.summarysize(m50)154427
+because the size of L and A are the same as before
BlockDescription(m50)| rows | +ab | +a | +b | +fixed | +
|---|---|---|---|---|
| 25 | +Diagonal | ++ | + | + |
| 5 | +Dense | +Diagonal | ++ | + |
| 5 | +Dense | +Dense | +Diag/Dense | ++ |
| 2 | +Dense | +Dense | +Dense | +Dense | +
and the elapsed time per evaluation of the objective is essentially the same as for the smaller model.
+@b m50 objective(updateL!(_))7.819 μs (27 allocs: 528 bytes)
+