diff --git a/note/.gitignore b/note/.gitignore new file mode 100644 index 0000000..8904018 --- /dev/null +++ b/note/.gitignore @@ -0,0 +1,3 @@ +/.quarto/ +**/*.quarto_ipynb +/.vscode/ \ No newline at end of file diff --git a/note/Note_on_mixed_model_calculations.html b/note/Note_on_mixed_model_calculations.html new file mode 100644 index 0000000..9dd2746 --- /dev/null +++ b/note/Note_on_mixed_model_calculations.html @@ -0,0 +1,3672 @@ + + + + + + + + + + + + +Note on mixed model calculations + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+

Note on mixed model calculations

+
+ + +
+
Authors
+
Affiliations
+ +
+

Douglas Bates

+
+ +
+

Phillip Alday

+
+ +
+ +
+ + +
+
Published
+
+

2025-12-05

+
+
+ + +
+ + + +
+ + +
+

1 Introduction

+

This note is to reproduce the simulated two-factor anova model in Zhou et al. (2019) and to fit it using the MixedModels.jl package. I would have included comparative results from the VarianceComponentModels.jl package but I was unable to install it, even on the long-term-support (lts) version of Julia. (Repairs may be as simple as merging the CompatHelper PR but I didn’t check.)

+

The model incorporates two random effects factors and their interaction. In the simulation the number of levels of the grouping factors for the random effects is set at 5 for a and b so that the number of levels for the interaction term, ab, is 25. The number of replicates at each level of the interaction is the parameter c.

+

We note that only having five levels each of the grouping factors \(\mathbf{a}\) and \(\mathbf{b}\) may result in an unstable estimation situation, because it is difficult to estimate a variance from only five distinct pieces of information. In such cases, as we will see below, the estimates of variance components can converge to zero.

+

Convergence to zero for a variance component is not a problem when using MixedModels.jl because it was designed with this possibility in mind.

+
+
+

2 Setting up the simulation

+

We create a DataFrame corresponding to the experimental design with a placeholder response vector then create a LinearMixedModel without fitting it then use the simulate! function in the MixedModels package to simulate the response given values of the parameter.

+

Load the packages to be used

+
+
using Chairmarks          # to benchmark function executions
+using DataFrames
+using MixedModels
+using PooledArrays        # similar to factor in R
+using Random              # random number generators
+using RCall               # call R from Julia
+
+

Create a function to generate a DataFrame with a numeric response, y, and PooledArrays, a, b, and ab.

+
+
"""
+    design(c; a=5, b=5)
+
+Return a DataFrame with columns y, a, b, and ab in a two-factor design with interaction 
+"""
+function design(c::Integer; a::Integer=5, b::Integer=5)
+    signed, compress = true, true    # used as named arguments to PooledArray()
+    av = repeat(string.('a':'z')[1:a], inner=b * c)
+    bv = repeat(string.('A':'Z')[1:b], inner=c, outer=b)
+    return DataFrame(
+        y = zeros(a * b * c),
+        a = PooledArray(av; signed, compress),
+        b = PooledArray(bv; signed, compress),
+        ab = PooledArray(av .* bv; signed, compress),
+    )
+end
+
+

For c = 5 the data frame is

+
+
d05 = design(5)
+
+
125×4 DataFrame
100 rows omitted
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Rowyabab
Float64StringStringString
10.0aAaA
20.0aAaA
30.0aAaA
40.0aAaA
50.0aAaA
60.0aBaB
70.0aBaB
80.0aBaB
90.0aBaB
100.0aBaB
110.0aCaC
120.0aCaC
130.0aCaC
1140.0eCeC
1150.0eCeC
1160.0eDeD
1170.0eDeD
1180.0eDeD
1190.0eDeD
1200.0eDeD
1210.0eEeE
1220.0eEeE
1230.0eEeE
1240.0eEeE
1250.0eEeE
+
+
+
+
+
+

3 Simulating a single instance

+

We define a formula, form, that provides for scalar fixed-effects and scalar random effects for a, b, and ab. Because this binding is declared to be const we avoid declaring it more than once.

+
+
if !@isdefined(form)
+    const form = @formula(y ~ 1 + (1 | a) + (1 | b) + (1 | ab))
+end
+m05 = LinearMixedModel(form, d05)
+
+

The simulate! function modifies this model by overwriting the place-holder response vector with the simulated response, in the model object only, not in the original data frame, d05. The parameters are set, as in Zhou et al. (2019), to \(\mathbf{\beta}=[1]\), \(\sigma=1\) (written \(\sigma_e\) in Zhou et al. (2019)) and values of \(\mathbf{\theta}=\left[\sigma_1/\sigma, \sigma_2/\sigma, \sigma_3/\sigma\right]\) corresponding to the (constant)ratios \(\sigma^2_i/\sigma^2_e, i=1,2,3\) in Zhou et al. (2019).

+

The first set of \(\mathbf{\theta}\) values is zeros, as in Zhou et al. (2019).

+
+
rng = Xoshiro(6345789)              # initialize a random number generator
+print(fit!(simulate!(rng, m05; β=[1.0], σ=1.0, θ=zeros(3))))
+
+
Linear mixed model fit by maximum likelihood
+ y ~ 1 + (1 | a) + (1 | b) + (1 | ab)
+   logLik   -2 logLik     AIC       AICc        BIC    
+  -159.6381   319.2763   329.2763   329.7805   343.4178
+
+Variance components:
+            Column    Variance  Std.Dev. 
+ab       (Intercept)  0.0000000 0.0000000
+a        (Intercept)  0.0000000 0.0000000
+b        (Intercept)  0.0042240 0.0649925
+Residual              0.7490554 0.8654799
+ Number of obs: 125; levels of grouping factors: 25, 5, 5
+
+  Fixed-effects parameters:
+─────────────────────────────────────────────────
+               Coef.  Std. Error      z  Pr(>|z|)
+─────────────────────────────────────────────────
+(Intercept)  1.02392   0.0826877  12.38    <1e-34
+─────────────────────────────────────────────────
+
+
+

To be able to reproduce this fit we copy the simulated response vector, m05.y, into d05.y.

+
+
copyto!(d05.y, m05.y)
+
+

It is not surprising that estimates of two of the three components of \(\mathbf{\theta}\) are zero, as the response was simulated from \(\mathbf{\theta}=[0,0,0]\).

+

The optsum property of the fitted model, m05, contains information on the progress of the iterative optimization to obtain the mle’s.

+
+
m05.optsum
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Initialization
Initial parameter vector[1.0, 1.0, 1.0]
Initial objective value360.034021929135
Optimizer settings
OptimizerLN_NEWUOA
Backendnlopt
ftol_rel1.0e-12
ftol_abs1.0e-8
xtol_rel0.0
xtol_abs[1.0e-10, 1.0e-10, 1.0e-10]
initial_step[1.0, 1.0, 1.0]
maxfeval-1
maxtime-1.0
xtol_zero_abs0.001
ftol_zero_abs1.0e-5
Result
Function evaluations55
Final parameter vector[0.0, 0.0, 0.0751]
Final objective value319.2763
Return codeFTOL_REACHED
+
+
+

In this case it took 55 function evaluations to declare convergence but that number will depend on the convergence criteria set. We use rather stringent criteria. In particular, ftol_rel, which appears to be the criterion used in Zhou et al. (2019), is, by default, set to \(10^{-12}\) in MixedModels.jl, as compared to \(10^{-8}\) in Zhou et al. (2019).

+

The progress of the iterations is recorded in the fitlog table in the optsum property.

+
+
m05.optsum.fitlog
+
+
Table with 2 columns and 56 rows:
+      θ                                  objective
+    ┌─────────────────────────────────────────────
+ 1  │ [1.0, 1.0, 1.0]                    360.034
+ 2  │ [2.0, 1.0, 1.0]                    381.952
+ 3  │ [1.0, 2.0, 1.0]                    365.686
+ 4  │ [1.0, 1.0, 2.0]                    365.65
+ 5  │ [0.0, 1.0, 1.0]                    339.29
+ 6  │ [1.0, 0.0, 1.0]                    353.555
+ 7  │ [1.0, 1.0, 0.0]                    353.761
+ 8  │ [-0.921548, 0.722701, 0.728237]    353.677
+ 9  │ [-0.234102, 1.31609, 1.30868]      345.449
+ 10 │ [0.0453179, 1.10931, 0.959718]     339.89
+ 11 │ [0.0269061, 0.799954, 0.852495]    336.199
+ 12 │ [0.0193397, 0.476201, 0.471541]    328.142
+ 13 │ [0.0791365, -0.186749, -0.274731]  321.755
+ 14 │ [0.2224, -1.14528, -0.0283809]     333.818
+ 15 │ [0.0891207, -0.240223, -0.445285]  324.799
+ 16 │ [-0.145542, 0.25668, -0.220975]    322.579
+ 17 │ [0.192, -0.176673, -0.221955]      322.201
+ ⋮  │                 ⋮                      ⋮
+
+
+

If we look at the last 15 values of the objective we can see that the optimizer would have been declared to have converged in 15 fewer function evaluations if ftol_rel was set to \(10^{-8}\).

+
+
last(m05.optsum.fitlog.objective, 15)
+
+
15-element Vector{Float64}:
+ 319.27627702480993
+ 319.2762768841083
+ 319.27627666270564
+ 319.27627693140636
+ 319.27627701023084
+ 319.2762767867746
+ 319.27627663951205
+ 319.276276642132
+ 319.2762766425802
+ 319.27627664037254
+ 319.2762766418587
+ 319.27627664207375
+ 319.27627664178124
+ 319.2762766393934
+ 319.2762766393925
+
+
+

When comparing the number of iterations between algorithms, bear in mind that the optimizer used here reports the number of evaluations of the objective. Other algorithms may count iterations that involve more than one evaluation of the objective or may involve gradient evaluations.

+
+
+

4 Benchmarking

+

For this model, and for all the other models in the simulation, the blocked Cholesky factor, \(\mathbf{L}\) to be updated for each evaluation of the profiled log-likelihood has the structure

+
+
BlockDescription(m05)
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
rowsababfixed
25Diagonal
5DenseDiagonal
5DenseDenseDiag/Dense
2DenseDenseDenseDense
+
+
+

That is, regardless of the value of c in the simulation, the updates for evaluating the profiled log-likelihood are for a blocked lower Cholesky factor of size \(37\times 37\) of which the upper left \(25\times 25\) block is diagonal and trivial to update.

+

As a sparse matrix this would have the form

+
+
sparseL(m05; full=true)
+
+
37×37 SparseArrays.SparseMatrixCSC{Float64, Int32} with 48 stored entries:
+⎡⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
+⎢⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⠀⠀⠀⎥
+⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣑⣄⠀⎥
+⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠉⠉⠁⎦
+
+
+

but it is stored as a blocked triangular matrix. The convergence to \(\theta_1=0, \theta_2=0\) makes this matrix appear to be more sparse than it, in fact, is. Generally there would be non-zero values in the last 12 rows and first 30 columns.

+

The update operation is very fast

+
+
@b m05 objective(updateL!(_))
+
+
7.792 μs (27 allocs: 528 bytes)
+
+
+

as is the fitting procedure from start to finish

+
+
@b fit(MixedModel, $form, $d05)
+
+
674.708 μs (3540 allocs: 180.859 KiB)
+
+
+

and the m05 object is quite small

+
+
Base.summarysize(m05)
+
+
32991
+
+
+

We now fit the same model to this response with the lme4 package in R.

+

First, transfer the data frame and the formula to R

+
+
@rput d05
+@rput form
+
+

then fit the model

+
+
R"m05 <- lme4::lmer(form, d05, REML=FALSE, control=lme4::lmerControl(calc.derivs=FALSE))"
+R"summary(m05)"
+
+
RObject{VecSxp}
+Linear mixed model fit by maximum likelihood  ['lmerMod']
+Formula: y ~ 1 + (1 | a) + (1 | b) + (1 | ab)
+   Data: d05
+Control: lme4::lmerControl(calc.derivs = FALSE)
+
+      AIC       BIC    logLik -2*log(L)  df.resid 
+    329.3     343.4    -159.6     319.3       120 
+
+Scaled residuals: 
+    Min      1Q  Median      3Q     Max 
+-2.9788 -0.6642 -0.0621  0.6158  2.3267 
+
+Random effects:
+ Groups   Name        Variance  Std.Dev. 
+ ab       (Intercept) 0.000e+00 0.000e+00
+ b        (Intercept) 4.226e-03 6.501e-02
+ a        (Intercept) 2.034e-10 1.426e-05
+ Residual             7.491e-01 8.655e-01
+Number of obs: 125, groups:  ab, 25; b, 5; a, 5
+
+Fixed effects:
+            Estimate Std. Error t value
+(Intercept)  1.02392    0.08269   12.38
+optimizer (nloptwrap) convergence code: 0 (OK)
+boundary (singular) fit: see help('isSingular')
+
+
+
+

Notice that the convergence is to a slightly different parameter vector but a similar deviance. The main difference in the converged parameter estimates is in \(\sigma_2\), which is \(1.426\times10^{-5}\) here and zero in the MixedModels fit. But random effects with a standard deviation that small are negligible.

+
+
deviance(m05)
+
+
319.2762766393925
+
+
+

Fitting this model in R using lme4::lmer is much slower than using MixedModels.jl.

+
+
@b R"lme4::lmer(form, d05, REML=FALSE, lme4::lmerControl(calc.derivs=FALSE))"
+
+
9.063 ms (51 allocs: 1.562 KiB)
+
+
+
+

4.1 Data simulated from non-zero \(\theta\) values

+
+
fit!(simulate!(rng, m05; β=[1.], σ=1., θ=ones(3)))
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Est.SEzpσ_abσ_aσ_b
(Intercept)-0.01111.0528-0.010.99160.81391.36061.8764
Residual0.9733
+
+
+
+
m05.optsum
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Initialization
Initial parameter vector[1.0, 1.0, 1.0]
Initial objective value416.9623089585963
Optimizer settings
OptimizerLN_NEWUOA
Backendnlopt
ftol_rel1.0e-12
ftol_abs1.0e-8
xtol_rel0.0
xtol_abs[1.0e-10, 1.0e-10, 1.0e-10]
initial_step[1.0, 1.0, 1.0]
maxfeval-1
maxtime-1.0
xtol_zero_abs0.001
ftol_zero_abs1.0e-5
Result
Function evaluations50
Final parameter vector[0.8362, 1.3978, 1.9278]
Final objective value411.236
Return codeFTOL_REACHED
+
+
+
+
+
+

5 Larger data sets

+

Increasing c, the number of replicates at each level of ab does not substantially increase the size of the model

+
+
m50 = fit!(
+    simulate!(rng, LinearMixedModel(form, design(50)); β=ones(1), σ=1.0, θ=ones(3))
+)
+print(m50)
+
+
Linear mixed model fit by maximum likelihood
+ y ~ 1 + (1 | a) + (1 | b) + (1 | ab)
+   logLik   -2 logLik     AIC       AICc        BIC    
+ -1812.7803  3625.5606  3635.5606  3635.6088  3661.2151
+
+Variance components:
+            Column   Variance Std.Dev. 
+ab       (Intercept)  1.048022 1.023729
+a        (Intercept)  0.017665 0.132909
+b        (Intercept)  0.367207 0.605976
+Residual              0.978542 0.989213
+ Number of obs: 1250; levels of grouping factors: 25, 5, 5
+
+  Fixed-effects parameters:
+────────────────────────────────────────────────
+               Coef.  Std. Error     z  Pr(>|z|)
+────────────────────────────────────────────────
+(Intercept)  2.21824    0.345945  6.41    <1e-09
+────────────────────────────────────────────────
+
+
+
+
Base.summarysize(m50)
+
+
154427
+
+
+

because the size of L and A are the same as before

+
+
BlockDescription(m50)
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
rowsababfixed
25Diagonal
5DenseDiagonal
5DenseDenseDiag/Dense
2DenseDenseDenseDense
+
+
+

and the elapsed time per evaluation of the objective is essentially the same as for the smaller model.

+
+
@b m50 objective(updateL!(_))
+
+
7.819 μs (27 allocs: 528 bytes)
+
+
+
+

References

+
+
+Zhou, Hua, Liuyi Hu, Jin Zhou, and Kenneth Lange. 2019. “MM Algorithms for Variance Components Models.” Journal of Computational and Graphical Statistics 28 (2): 350–61. https://doi.org/10.1080/10618600.2018.1529601. +
+
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/note/Note_on_mixed_model_calculations.qmd b/note/Note_on_mixed_model_calculations.qmd new file mode 100644 index 0000000..c81f9c5 --- /dev/null +++ b/note/Note_on_mixed_model_calculations.qmd @@ -0,0 +1,278 @@ +--- +title: "Note on mixed model calculations" +author: + - name: Douglas Bates + email: dmbates@gmail.com + orcid: 0000-0001-8316-9503 + affiliation: + - name: University of Wisconsin - Madison + city: Madison + state: WI + url: https://www.wisc.edu + department: Statistics + - name: Phillip Alday + email: me@phillipalday.com + orcid: 0000-0002-9984-5745 + affiliation: + - name: Beacon Biosignals + url: https://beacon.bio +date: last-modified +date-format: iso +toc: true +bibliography: bibliography.bib +number-sections: true +engine: julia +julia: + exeflags: + - -tauto + - --project=@. +format: + html: + toc: true + toc-location: right + embed-resources: true +--- + +## Introduction {#sec-intro} + +This note is to reproduce the simulated two-factor anova model in @Zhou03042019 and to fit it using the [MixedModels.jl](https://github.com/JuliaStats/MixedModels.jl) package. +I would have included comparative results from the [VarianceComponentModels.jl](https://github.com/OpenMendel/VarianceComponentModels.jl) package but I was unable to install it, even on the long-term-support (lts) version of Julia. +(Repairs may be as simple as merging the [CompatHelper PR](https://github.com/OpenMendel/VarianceComponentModels.jl/pull/22) but I didn't check.) + +The model incorporates two random effects factors and their interaction. +In the simulation the number of levels of the grouping factors for the random effects is set at 5 for `a` and `b` so that the number of levels for the interaction term, `ab`, is 25. +The number of replicates at each level of the interaction is the parameter `c`. + +We note that only having five levels each of the grouping factors $\mathbf{a}$ and $\mathbf{b}$ may result in an unstable estimation situation, because it is difficult to estimate a variance from only five distinct pieces of information. +In such cases, as we will see below, the estimates of variance components can converge to zero. + +Convergence to zero for a variance component is not a problem when using [MixedModels.jl](https://github.com/JuliaStats/MixedModels.jl) because it was designed with this possibility in mind. + +## Setting up the simulation {#sec-setup} + +We create a `DataFrame` corresponding to the experimental design with a placeholder response vector then create a `LinearMixedModel` without fitting it then use the `simulate!` function in the `MixedModels` package to simulate the response given values of the parameter. + +Load the packages to be used + +```{julia} +#| label: loadpackages +using Chairmarks # to benchmark function executions +using DataFrames +using MixedModels +using PooledArrays # similar to factor in R +using Random # random number generators +using RCall # call R from Julia +``` + +Create a function to generate a `DataFrame` with a numeric response, `y`, and `PooledArrays`, `a`, `b`, and `ab`. + +```{julia} +#| label: definedesign +#| output: false +""" + design(c; a=5, b=5) + +Return a DataFrame with columns y, a, b, and ab in a two-factor design with interaction +""" +function design(c::Integer; a::Integer=5, b::Integer=5) + signed, compress = true, true # used as named arguments to PooledArray() + av = repeat(string.('a':'z')[1:a], inner=b * c) + bv = repeat(string.('A':'Z')[1:b], inner=c, outer=b) + return DataFrame( + y = zeros(a * b * c), + a = PooledArray(av; signed, compress), + b = PooledArray(bv; signed, compress), + ab = PooledArray(av .* bv; signed, compress), + ) +end +``` + +For `c = 5` the data frame is + +```{julia} +#| label: d05 +d05 = design(5) +``` + +## Simulating a single instance {#sec-simulating} + +We define a formula, `form`, that provides for scalar fixed-effects and scalar random effects for `a`, `b`, and `ab`. +Because this binding is declared to be `const` we avoid declaring it more than once. + +```{julia} +#| output: false +#| warn: false +if !@isdefined(form) + const form = @formula(y ~ 1 + (1 | a) + (1 | b) + (1 | ab)) +end +m05 = LinearMixedModel(form, d05) +``` + +The `simulate!` function modifies this model by overwriting the place-holder response vector with the simulated response, in the model object only, not in the original data frame, `d05`. +The parameters are set, as in @Zhou03042019, to $\mathbf{\beta}=[1]$, $\sigma=1$ (written $\sigma_e$ in @Zhou03042019) and +values of $\mathbf{\theta}=\left[\sigma_1/\sigma, \sigma_2/\sigma, \sigma_3/\sigma\right]$ corresponding to the (constant)ratios $\sigma^2_i/\sigma^2_e, i=1,2,3$ in @Zhou03042019. + +The first set of $\mathbf{\theta}$ values is zeros, as in @Zhou03042019. + +```{julia} +rng = Xoshiro(6345789) # initialize a random number generator +print(fit!(simulate!(rng, m05; β=[1.0], σ=1.0, θ=zeros(3)))) +``` + +To be able to reproduce this fit we copy the simulated response vector, `m05.y`, into `d05.y`. + +```{julia} +#| label: copym05ytod05y +#| output: false +copyto!(d05.y, m05.y) +``` + +It is not surprising that estimates of two of the three components of $\mathbf{\theta}$ are zero, as the response was simulated from $\mathbf{\theta}=[0,0,0]$. + +The `optsum` property of the fitted model, `m05`, contains information on the progress of the iterative optimization to obtain the mle's. + +```{julia} +#| label: showoptsum +m05.optsum +``` + +In this case it took 55 function evaluations to declare convergence but that number will depend on the convergence criteria set. +We use rather stringent criteria. +In particular, `ftol_rel`, which appears to be the criterion used in @Zhou03042019, is, by default, set to $10^{-12}$ in `MixedModels.jl`, as compared to $10^{-8}$ in @Zhou03042019. + +The progress of the iterations is recorded in the `fitlog` table in the `optsum` property. + +```{julia} +#| label: fitlog +m05.optsum.fitlog +``` + +If we look at the last 15 values of the objective we can see that the optimizer would have been declared to have converged in 15 fewer function evaluations if `ftol_rel` was set to $10^{-8}$. + +```{julia} +#| label: lastobj +last(m05.optsum.fitlog.objective, 15) +``` + +When comparing the number of iterations between algorithms, bear in mind that the optimizer used here reports the number of evaluations of the objective. +Other algorithms may count iterations that involve more than one evaluation of the objective or may involve gradient evaluations. + +## Benchmarking {#sec-benchmark} + +For this model, and for all the other models in the simulation, the blocked Cholesky factor, $\mathbf{L}$ to be updated for each evaluation of the profiled log-likelihood has the structure + +```{julia} +#| label: Blockdesc +BlockDescription(m05) +``` + +That is, regardless of the value of `c` in the simulation, the updates for evaluating the profiled log-likelihood are for a blocked lower Cholesky factor of size $37\times 37$ of which the upper left $25\times 25$ block is diagonal and trivial to update. + +As a sparse matrix this would have the form + +```{julia} +sparseL(m05; full=true) +``` + +but it is stored as a blocked triangular matrix. +The convergence to $\theta_1=0, \theta_2=0$ makes this matrix appear to be more sparse than it, in fact, is. +Generally there would be non-zero values in the last 12 rows and first 30 columns. + +The update operation is very fast + +```{julia} +#| label: bnchmrkupdate +@b m05 objective(updateL!(_)) +``` + +as is the fitting procedure from start to finish + +```{julia} +#| label: bnchmrkfit +@b fit(MixedModel, $form, $d05) +``` + +and the `m05` object is quite small + +```{julia} +#| label: modelsize +Base.summarysize(m05) +``` + +We now fit the same model to this response with the `lme4` package in [R](https://www.r-project.org). + +First, transfer the data frame and the formula to R + +```{julia} +#| output: false +@rput d05 +@rput form +``` + +then fit the model + +```{julia} +#| label: Rfit +#| warning: false +R"m05 <- lme4::lmer(form, d05, REML=FALSE, control=lme4::lmerControl(calc.derivs=FALSE))" +R"summary(m05)" +``` + +Notice that the convergence is to a slightly different parameter vector but a similar deviance. +The main difference in the converged parameter estimates is in $\sigma_2$, which is $1.426\times10^{-5}$ here and zero in the MixedModels fit. +But random effects with a standard deviation that small are negligible. + +```{julia} +#| label: Rfitdeviance +deviance(m05) +``` + +Fitting this model in `R` using `lme4::lmer` is much slower than using `MixedModels.jl`. + +```{julia} +#| warning: false +@b R"lme4::lmer(form, d05, REML=FALSE, lme4::lmerControl(calc.derivs=FALSE))" +``` + +### Data simulated from non-zero $\theta$ values + +```{julia} +fit!(simulate!(rng, m05; β=[1.], σ=1., θ=ones(3))) +``` + +```{julia} +m05.optsum +``` + +## Larger data sets {#sec-larger} + +Increasing `c`, the number of replicates at each level of `ab` does not substantially increase the size of the model + +```{julia} +#| label: m50 +m50 = fit!( + simulate!(rng, LinearMixedModel(form, design(50)); β=ones(1), σ=1.0, θ=ones(3)) +) +print(m50) +``` + +```{julia} +Base.summarysize(m50) +``` + +because the size of `L` and `A` are the same as before + +```{julia} +BlockDescription(m50) +``` + +and the elapsed time per evaluation of the objective is essentially the same as for the smaller model. + +```{julia} +@b m50 objective(updateL!(_)) +``` + +### References {.unnumbered} + +::: {#refs} +::: diff --git a/note/_quarto.yml b/note/_quarto.yml new file mode 100644 index 0000000..475a409 --- /dev/null +++ b/note/_quarto.yml @@ -0,0 +1,3 @@ +project: + title: "Note_on_mixed_model_calculations" + diff --git a/note/bibliography.bib b/note/bibliography.bib new file mode 100644 index 0000000..85609ff --- /dev/null +++ b/note/bibliography.bib @@ -0,0 +1,16 @@ +@article{Zhou03042019, + author = {Hua Zhou and Liuyi Hu and Jin Zhou and Kenneth Lange}, + title = {MM Algorithms for Variance Components Models}, + journal = {Journal of Computational and Graphical Statistics}, + volume = {28}, + number = {2}, + pages = {350--361}, + year = {2019}, + publisher = {ASA Website}, + doi = {10.1080/10618600.2018.1529601}, + note ={PMID: 31592195}, + URL = {https://doi.org/10.1080/10618600.2018.1529601}, + eprint = {https://doi.org/10.1080/10618600.2018.1529601} +} + +