JuliaMixedModels · dmbates · Aug 1, 2023 · Dec 27, 2023
diff --git a/Project.toml b/Project.toml
@@ -6,6 +6,7 @@ version = "0.1.0"
 [deps]
 AlgebraOfGraphics = "cbdf2221-f076-402e-a563-3d30da359d67"
 Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
+BSplineKit = "093aae92-e908-43d7-9660-e50ee39d5a0a"
 BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
 CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
 CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
@@ -44,7 +45,7 @@ ZipFile = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
 AlgebraOfGraphics = "0.6"
 Arrow = "2"
 CSV = "0.10"
-CairoMakie = "0.8,0.9,0.10"
+CairoMakie = "0.8,0.9,0.10,0.11"
 CategoricalArrays = "0.10"
 Chain = "0.5"
 DataAPI = "1.9"
@@ -56,8 +57,8 @@ Effects = "0.1,1"
 FreqTables = "0.4"
 GLM = "1"
 MixedModels = "4,5"
-MixedModelsMakie = "0.3.13"
-NLopt = "0.6"
+MixedModelsMakie = "0.3"
+NLopt = "0.6,1"
 ProgressMeter = "1.7"
 RCall = "0.13"
 Scratch = "1"

diff --git a/_quarto.yml b/_quarto.yml
@@ -19,6 +19,7 @@ book:
   appendices:
     - datatables.qmd
     - linalg.qmd
+    - profiling.qmd
     - aGHQ.qmd
 
 execute-dir: project

diff --git a/aGHQ.qmd b/aGHQ.qmd
@@ -137,7 +137,6 @@ using LinearAlgebra
 using MixedModels
 using MixedModelsMakie
 using NLopt
-using PooledArrays
 using ProgressMeter
 using StatsAPI
 

diff --git a/intro.qmd b/intro.qmd
@@ -102,26 +102,26 @@ To access these data within `Julia` we must first attach this package, and other
 ```{julia}
 #| code-fold: show
 #| output: false
+import ProgressMeter    # progress of optimizer iterations
+ProgressMeter.ijulia_behavior(:clear) # suppress progress output
+
 using AlgebraOfGraphics # high-level graphics
 using CairoMakie        # graphics back-end
-using DataFrameMacros   # elegant DataFrame manipulation
 using DataFrames        # DataFrame implementation
 using Markdown          # utilities for generating markdown text
 using MixedModels       # fit and examine mixed-effects models
 using MixedModelsMakie  # graphics for mixed-effects models
-using Printf            # formatted printing
-using ProgressMeter     # progress of optimizer iterations
 using Random            # random number generation
-using StatsBase         # basic statistical summaries
+using Statistics        # basic statistical summaries
 
 using EmbraceUncertainty: dataset # `dataset` means this one
+using AlgebraOfGraphics: density  # `density` means this one
 
 CairoMakie.activate!(; type="svg")    # Scalable Vector Graphics
-ProgressMeter.ijulia_behavior(:clear) # suppress progress output
 ```
 
 A package must be attached before any of the data sets or functions in the package can be used.
-If entering this line results in an error report stating that there is no package by one of these names then you must first install the package(s).
+If entering this block of code results in an error report stating that there is no package by one of these names then you must first install the package(s).
 If you are using Julia version 1.8.0 or later the error report will include instructions on how to do this.
 
 In what follows, we will assume that these packages have been installed and attached to the session before any of the code shown has been run.
@@ -196,7 +196,7 @@ last(dyestuff, 7)
 or we could tabulate the data using `groupby` and `combine` from [DataFrames](https://github.com/JuliaData/DataFrames.jl).
 
 ```{julia}
-combine(groupby(dyestuff, :batch), :yield => mean, nrow => :n)
+combine(groupby(dyestuff, :batch), :yield => mean, nrow)
 ```
 
 Although this table does show us an important property of the data, namely that there are exactly 5 observations on each batch --- a property that we will describe by saying that the data are *balanced* with respect to `batch` --- we usually learn much more about the structure of such data from plots like @fig-dyestuffdata
@@ -206,14 +206,21 @@ Although this table does show us an important property of the data, namely that
 #| label: fig-dyestuffdata
 #| code-fold: true
 let
-  sumry = sort!(combine(groupby(dyestuff, :batch), :yield => mean => :yield), :yield)
+  sumry = sort!(
+    combine(groupby(dyestuff, :batch), :yield => mean => :yield),
+    :yield,
+  )
   mp = mapping(
     :yield => "Yield of dyestuff [g]",
-    :batch => sorter(sumry.batch) => "Batch of intermediate product",
+    :batch =>
+      sorter(sumry.batch) => "Batch of intermediate product",
   )
   draw(
-    (data(dyestuff) * mp * visual(Scatter; marker='○', markersize=12)) +
-    (data(sumry) * mp * visual(Lines));
+    (
+      data(dyestuff) *
+      mp *
+      visual(Scatter; marker='○', markersize=12)
+    ) + (data(sumry) * mp * visual(Lines));
     figure=(; resolution=(800, 400)),
   )
 end
@@ -272,14 +279,20 @@ A data plot (@fig-dyestuff2data)
 #| label: fig-dyestuff2data
 #| code-fold: true
 let
-  sumry = sort!(combine(groupby(dyestuff2, :batch), :yield => mean => :yield), :yield)
+  sumry = sort!(
+    combine(groupby(dyestuff2, :batch), :yield => mean => :yield),
+    :yield,
+  )
   mp = mapping(
     :yield => "Simulated yield",
     :batch => sorter(sumry.batch) => "Batch",
   )
   draw(
-    (data(dyestuff2) * mp * visual(Scatter; marker='○', markersize=12)) +
-    (data(sumry) * mp * visual(Lines));
+    (
+      data(dyestuff2) *
+      mp *
+      visual(Scatter; marker='○', markersize=12)
+    ) + (data(sumry) * mp * visual(Lines));
     figure=(; resolution=(800, 400)),
   )
 end
@@ -304,9 +317,8 @@ We will explain the structure of the formula after we have considered an example
 We fit a model to the data allowing for an overall level of the `yield` and for an additive random effect for each level of `batch`.
 
 ```{julia}
-dsm01 = let
-  form = @formula(yield ~ 1 + (1 | batch))
-  fit(MixedModel, form, dyestuff)
+dsm01 = let f = @formula(yield ~ 1 + (1 | batch))
+  fit(MixedModel, f, dyestuff)
 end
 ```
 
@@ -379,9 +391,8 @@ An alternative measure of precision of the estimate - a coverage interval based
 Fitting a similar model to the `dyestuff2` data produces an estimate $\widehat{\sigma}_1=0$.
 
 ```{julia}
-dsm02 = let
-  form = @formula(yield ~ 1 + (1 | batch))
-  fit(MixedModel, form, dyestuff2)
+dsm02 = let f = @formula(yield ~ 1 + (1 | batch))
+  fit(MixedModel, f, dyestuff2)
 end
 println(dsm02)
 ```
@@ -559,32 +570,14 @@ For methods that involve simulation, it is best to initialize a random number ge
 Beginning with Julia v1.7.0 the default random number generator is the `Xoshiro` generator, which we initialize to an arbitrary, but reproducible, value.
 
 The object returned by a call to `parametricbootstrap` has a somewhat complex internal structure to allow for the ability to simulate from complex models while still maintaining a comparatively small storage footprint.
-To examine the distribution of the parameter estimates we extract a table of all the estimated parameters and convert it to a `DataFrame`.
+To examine the distribution of the parameter estimates we extract a `Table` of all the estimated parameters.
 
 ```{julia}
 #| warning: false
-const hide_progress = true # hide the progress bar when sampling
+const progress = false # hide the progress bar when sampling
 Random.seed!(4321234)      # random number generator
-dsm01samp = parametricbootstrap(10_000, dsm01; hide_progress)
-dsm01pars = DataFrame(dsm01samp.allpars)
-first(dsm01pars, 7)
-```
-
-:::{.callout-note}
-
-### Switch to Table(dsm01samp.tbl) formulation
-
-:::
-
-```{julia}
-last(dsm01pars, 7)
-```
-
-Plots of the bootstrap estimates for individual parameters are obtained by extracting subsets of the rows of this dataframe using `subset` methods from the `DataFrames` package or the `@subset` macro from the `DataFramesMacros` package.
-For example,
-
-```{julia}
-βdf = @subset(dsm01pars, :type == "β")
+dsm01samp = parametricbootstrap(10_000, dsm01; progress)
+dsm01stbl = dsm01samp.tbl
 ```
 
 We begin by examining density plots constructed using the [AlgebraOfGraphics](https://github.com/JuliaGraphics/AlgebraOfGraphics.jl) package.
@@ -596,32 +589,24 @@ You can check the details by clicking on the "Code" button in the HTML version o
 #| fig-cap: Kernel density plot of bootstrap fixed-effects parameter estimates from dsm01
 #| label: fig-dsm01_bs_beta_density
 draw(
-  data(@subset(dsm01pars, :type == "β")) *
-  mapping(
-    :value => "Bootstrap samples of β";
-    color=(:names => "Names"),
-  ) *
-  AlgebraOfGraphics.density();
-  figure=(; resolution=(800, 450)),
+  data(dsm01stbl) *
+  mapping(:β1 => "Bootstrap samples of β") *
+  density();
+  figure=(; resolution=(800, 450))
 )
 ```
 
-:::{.callout-note collapse="true"}
-
-### Use :compact=true when interpolating numeric values
-
-Find a way to use `:compact=true` when interpolating numeric values into the text
-:::
-
 ```{julia}
 #| echo: false
-Markdown.parse(
-  """
-  The distribution of the estimates of `β₁` is more-or-less a Gaussian (or "normal") shape, with a mean value of $(@sprintf("%f", mean(βdf.value)))  which is close to the estimated `β₁` of $(@sprintf("%f", only(dsm01.β))).
+let context = (:compact => true)
+  Markdown.parse(
+    """
+    The distribution of the estimates of `β₁` is more-or-less a Gaussian (or "normal") shape, with a mean value of $(repr(mean(dsm01stbl.β1); context)), which is close to the estimated `β₁` of $(repr(only(dsm01.β); context)).
 
-  Similarly the standard deviation of the simulated β values, $(std(βdf.value)) is close to the standard error of the parameter, $(only(stderror(dsm01))).
-""",
-)
+    Similarly the standard deviation of the simulated β values, $(repr(std(dsm01stbl.β1); context)), is close to the standard error of the parameter, $(repr(only(stderror(dsm01)); context)).
+  """,
+  )
+end
 ```
 
 In other words, the estimator of the fixed-effects parameter in this case behaves as we would expect.
@@ -634,12 +619,12 @@ The situation is different for the estimates of the standard deviation parameter
 #| fig-cap: Kernel density plot of bootstrap variance-component parameter estimates from model dsm01
 #| label: fig-dsm01_bs_sigma_density
 draw(
-  data(@subset(dsm01pars, :type == "σ")) *
+  data(dsm01stbl) *
   mapping(
-    :value => "Bootstrap samples of σ";
-    color=(:group => "Group"),
+    [:σ, :σ1] .=> "Bootstrap samples of standard deviations",
   ) *
-  AlgebraOfGraphics.density();
+  mapping(color=dims(1) => renamer(["σ", "σ₁"])) *
+  density();
   figure=(; resolution=(800, 450)),
 )
 ```
@@ -648,11 +633,13 @@ The estimator for the residual standard deviation, $\sigma$, is approximately no
 
 ```{julia}
 #| echo: false
-Markdown.parse(
-  """
-There is one peak around the "true" value for the simulation, $(only(dsm01.σs.batch)), and another peak at zero.
-""",
-)
+let context = :compact => true
+  Markdown.parse(
+    """
+    There is one peak around the "true" value for the simulation, $(repr(only(dsm01.σs.batch); context)), and another peak at zero.
+    """,
+  )
+end
 ```
 
 The apparent distribution of the estimates of $\sigma_1$ in @fig-dsm01_bs_sigma_density is being distorted by the method of approximating the density.
@@ -671,11 +658,11 @@ Use a lower alpha in the colors for multiple histograms so the bars behind anoth
 #| fig-cap: Histogram of bootstrap variance-components as standard deviations from model dsm01
 #| label: fig-dsm01_bs_sigma_hist
 draw(
-  data(@subset(dsm01pars, :type == "σ")) *
+  data(dsm01stbl) *
   mapping(
-    :value => "Bootstrap parameter estimates of σ";
-    color=(:group => "Group"),
+    [:σ, :σ1] .=> "Bootstrap samples of standard deviations",
   ) *
+  mapping(color=dims(1) => renamer(["σ", "σ₁"])) *
   AlgebraOfGraphics.histogram(; bins=80);
   figure=(; resolution=(800, 450)),
 )
@@ -698,19 +685,47 @@ In many cases standard errors are quoted for estimates of the variance component
 #| code-fold: true
 #| fig-cap: Histogram of bootstrap variance-components from model dsm01
 #| label: fig-dsm01_bs_sigmasqr_hist
-draw(
-  data(@transform(@subset(dsm01pars, :type == "σ"), abs2(:value))) *
-  mapping(
-    :value_abs2 => "Bootstrap sample of estimates of σ²",
-    color=:group,
-  ) *
-  AlgebraOfGraphics.histogram(; bins=200);
-  figure=(; resolution=(800, 450)),
-)
+let σs = ["σ", "σ1"]
+  draw(
+    data(dsm01stbl) *
+    mapping(
+      σs .=> abs2 .=> "Bootstrap samples of variance components",
+      color=dims(1) => renamer(σs),
+    ) *
+    AlgebraOfGraphics.histogram(; bins=200);
+    figure=(; resolution=(800, 450)),
+  )
+end
 ```
 
 The approach of creating coverage intervals from a bootstrap sample of parameter estimates, described in the next section, does accomodate non-normal distributions.
 
+## Quantile-quantile plots and profile-based confidence intervals
+
+Because the distribution of the bootstrap replications of $\sigma_1$ is a hybrid distribution --- that is, a continuous distribution of positive values with a non-zero probability of exactly zero --- neither a kernel density plot (@fig-dsm01_bs_sigma_density) nor a histogram (@fig-dsm01_bs_sigma_hist) is an ideal depiction of it.
+
+A *quantile-quantile* or [Q-Q plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot) provides an alternative representation of the distribution in which the "pulse" at zero can be easily discerned.
+If we choose to plot the quantiles of the standard normal (or Gaussian) distribution versus the sample quantiles, as in @fig-dsm01_bs_beta_qq, we have a visual assessment of the extent to which the distribution deviates from a normal distribution.
+Such a plot is also called a [normal probability plot](https://en.wikipedia.org/wiki/Normal_probability_plot).
+
+```{julia}
+#| code-fold: true
+#| fig-cap: Normal probability plot of bootstrap fixed-effects coefficients from model dsm01
+#| label: fig-dsm01_bs_beta_qq
+let f = Figure(; size=(1000,400)),
+  pars = [:β1 => "β", :σ => "σ", :σ1 => "σ₁"],
+  ppts = inv(512):inv(256):1.0,
+  z = quantile(Normal(), ppts)
+  map(enumerate(pars)) do (j, p)
+    ax = Axis(f[1, j]; xlabel=last(p), aspect=1)
+    lines!(quantile(getproperty(dsm01stbl, first(p)), ppts), z)
+  end
+  f
+end
+```
+
+
+
 ### Confidence intervals on the parameters {#sec-confints}
 
 A bootstrap sample of parameter estimates also provides us with a method of creating bootstrap coverage intervals, which can be regarded as confidence intervals on the parameters.
@@ -723,7 +738,7 @@ For example, from the sorted parameter values we could return the interval from
 The `shortestcovint` method returns the shortest of all these potential intervals which will correspond to the interval with the highest empirical density.
 
 ```{julia}
-DataFrame(shortestcovint(dsm01samp))
+Table(shortestcovint(dsm01samp))
 ```
 
 For the `(Intercept)` fixed-effects parameter the interval is similar to an "asymptotic" or Wald interval constructed as the estimate plus or minus two standard errors.
@@ -750,11 +765,13 @@ DataFrame(dsm01trace.optsum.fitlog)
 
 ```{julia}
 #| echo: false
-Markdown.parse(
-  """
-Here the algorithm converges after 18 function evaluations to a profiled deviance of $(dsm01trace.objective) at θ = $(only(dsm01trace.theta)).
-""",
-)
+let context = :compact => true
+  Markdown.parse(
+    """
+    Here the algorithm converges after 18 function evaluations to a profiled deviance of $(repr(dsm01trace.objective; context)) at θ = $(repr(only(dsm01trace.theta); context)).
+    """,
+  )
+end
 ```
 
 ## Assessing the random effects {#sec-assessRE}