Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ version = "0.1.0"
[deps]
AlgebraOfGraphics = "cbdf2221-f076-402e-a563-3d30da359d67"
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
BSplineKit = "093aae92-e908-43d7-9660-e50ee39d5a0a"
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
Expand Down Expand Up @@ -44,7 +45,7 @@ ZipFile = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
AlgebraOfGraphics = "0.6"
Arrow = "2"
CSV = "0.10"
CairoMakie = "0.8,0.9,0.10"
CairoMakie = "0.8,0.9,0.10,0.11"
CategoricalArrays = "0.10"
Chain = "0.5"
DataAPI = "1.9"
Expand All @@ -56,8 +57,8 @@ Effects = "0.1,1"
FreqTables = "0.4"
GLM = "1"
MixedModels = "4,5"
MixedModelsMakie = "0.3.13"
NLopt = "0.6"
MixedModelsMakie = "0.3"
NLopt = "0.6,1"
ProgressMeter = "1.7"
RCall = "0.13"
Scratch = "1"
Expand Down
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ book:
appendices:
- datatables.qmd
- linalg.qmd
- profiling.qmd
- aGHQ.qmd

execute-dir: project
Expand Down
1 change: 0 additions & 1 deletion aGHQ.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,6 @@ using LinearAlgebra
using MixedModels
using MixedModelsMakie
using NLopt
using PooledArrays
using ProgressMeter
using StatsAPI

Expand Down
195 changes: 106 additions & 89 deletions intro.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -102,26 +102,26 @@ To access these data within `Julia` we must first attach this package, and other
```{julia}
#| code-fold: show
#| output: false
import ProgressMeter # progress of optimizer iterations
ProgressMeter.ijulia_behavior(:clear) # suppress progress output

using AlgebraOfGraphics # high-level graphics
using CairoMakie # graphics back-end
using DataFrameMacros # elegant DataFrame manipulation
using DataFrames # DataFrame implementation
using Markdown # utilities for generating markdown text
using MixedModels # fit and examine mixed-effects models
using MixedModelsMakie # graphics for mixed-effects models
using Printf # formatted printing
using ProgressMeter # progress of optimizer iterations
using Random # random number generation
using StatsBase # basic statistical summaries
using Statistics # basic statistical summaries

using EmbraceUncertainty: dataset # `dataset` means this one
using AlgebraOfGraphics: density # `density` means this one

CairoMakie.activate!(; type="svg") # Scalable Vector Graphics
ProgressMeter.ijulia_behavior(:clear) # suppress progress output
```

A package must be attached before any of the data sets or functions in the package can be used.
If entering this line results in an error report stating that there is no package by one of these names then you must first install the package(s).
If entering this block of code results in an error report stating that there is no package by one of these names then you must first install the package(s).
If you are using Julia version 1.8.0 or later the error report will include instructions on how to do this.

In what follows, we will assume that these packages have been installed and attached to the session before any of the code shown has been run.
Expand Down Expand Up @@ -196,7 +196,7 @@ last(dyestuff, 7)
or we could tabulate the data using `groupby` and `combine` from [DataFrames](https://github.com/JuliaData/DataFrames.jl).

```{julia}
combine(groupby(dyestuff, :batch), :yield => mean, nrow => :n)
combine(groupby(dyestuff, :batch), :yield => mean, nrow)
```

Although this table does show us an important property of the data, namely that there are exactly 5 observations on each batch --- a property that we will describe by saying that the data are *balanced* with respect to `batch` --- we usually learn much more about the structure of such data from plots like @fig-dyestuffdata
Expand All @@ -206,14 +206,21 @@ Although this table does show us an important property of the data, namely that
#| label: fig-dyestuffdata
#| code-fold: true
let
sumry = sort!(combine(groupby(dyestuff, :batch), :yield => mean => :yield), :yield)
sumry = sort!(
combine(groupby(dyestuff, :batch), :yield => mean => :yield),
:yield,
)
mp = mapping(
:yield => "Yield of dyestuff [g]",
:batch => sorter(sumry.batch) => "Batch of intermediate product",
:batch =>
sorter(sumry.batch) => "Batch of intermediate product",
)
draw(
(data(dyestuff) * mp * visual(Scatter; marker='○', markersize=12)) +
(data(sumry) * mp * visual(Lines));
(
data(dyestuff) *
mp *
visual(Scatter; marker='○', markersize=12)
) + (data(sumry) * mp * visual(Lines));
figure=(; resolution=(800, 400)),
)
end
Expand Down Expand Up @@ -272,14 +279,20 @@ A data plot (@fig-dyestuff2data)
#| label: fig-dyestuff2data
#| code-fold: true
let
sumry = sort!(combine(groupby(dyestuff2, :batch), :yield => mean => :yield), :yield)
sumry = sort!(
combine(groupby(dyestuff2, :batch), :yield => mean => :yield),
:yield,
)
mp = mapping(
:yield => "Simulated yield",
:batch => sorter(sumry.batch) => "Batch",
)
draw(
(data(dyestuff2) * mp * visual(Scatter; marker='○', markersize=12)) +
(data(sumry) * mp * visual(Lines));
(
data(dyestuff2) *
mp *
visual(Scatter; marker='○', markersize=12)
) + (data(sumry) * mp * visual(Lines));
figure=(; resolution=(800, 400)),
)
end
Expand All @@ -304,9 +317,8 @@ We will explain the structure of the formula after we have considered an example
We fit a model to the data allowing for an overall level of the `yield` and for an additive random effect for each level of `batch`.

```{julia}
dsm01 = let
form = @formula(yield ~ 1 + (1 | batch))
fit(MixedModel, form, dyestuff)
dsm01 = let f = @formula(yield ~ 1 + (1 | batch))
fit(MixedModel, f, dyestuff)
end
```

Expand Down Expand Up @@ -379,9 +391,8 @@ An alternative measure of precision of the estimate - a coverage interval based
Fitting a similar model to the `dyestuff2` data produces an estimate $\widehat{\sigma}_1=0$.

```{julia}
dsm02 = let
form = @formula(yield ~ 1 + (1 | batch))
fit(MixedModel, form, dyestuff2)
dsm02 = let f = @formula(yield ~ 1 + (1 | batch))
fit(MixedModel, f, dyestuff2)
end
println(dsm02)
```
Expand Down Expand Up @@ -559,32 +570,14 @@ For methods that involve simulation, it is best to initialize a random number ge
Beginning with Julia v1.7.0 the default random number generator is the `Xoshiro` generator, which we initialize to an arbitrary, but reproducible, value.

The object returned by a call to `parametricbootstrap` has a somewhat complex internal structure to allow for the ability to simulate from complex models while still maintaining a comparatively small storage footprint.
To examine the distribution of the parameter estimates we extract a table of all the estimated parameters and convert it to a `DataFrame`.
To examine the distribution of the parameter estimates we extract a `Table` of all the estimated parameters.

```{julia}
#| warning: false
const hide_progress = true # hide the progress bar when sampling
const progress = false # hide the progress bar when sampling
Random.seed!(4321234) # random number generator
dsm01samp = parametricbootstrap(10_000, dsm01; hide_progress)
dsm01pars = DataFrame(dsm01samp.allpars)
first(dsm01pars, 7)
```

:::{.callout-note}

### Switch to Table(dsm01samp.tbl) formulation

:::

```{julia}
last(dsm01pars, 7)
```

Plots of the bootstrap estimates for individual parameters are obtained by extracting subsets of the rows of this dataframe using `subset` methods from the `DataFrames` package or the `@subset` macro from the `DataFramesMacros` package.
For example,

```{julia}
βdf = @subset(dsm01pars, :type == "β")
dsm01samp = parametricbootstrap(10_000, dsm01; progress)
dsm01stbl = dsm01samp.tbl
```

We begin by examining density plots constructed using the [AlgebraOfGraphics](https://github.com/JuliaGraphics/AlgebraOfGraphics.jl) package.
Expand All @@ -596,32 +589,24 @@ You can check the details by clicking on the "Code" button in the HTML version o
#| fig-cap: Kernel density plot of bootstrap fixed-effects parameter estimates from dsm01
#| label: fig-dsm01_bs_beta_density
draw(
data(@subset(dsm01pars, :type == "β")) *
mapping(
:value => "Bootstrap samples of β";
color=(:names => "Names"),
) *
AlgebraOfGraphics.density();
figure=(; resolution=(800, 450)),
data(dsm01stbl) *
mapping(:β1 => "Bootstrap samples of β") *
density();
figure=(; resolution=(800, 450))
)
```

:::{.callout-note collapse="true"}

### Use :compact=true when interpolating numeric values

Find a way to use `:compact=true` when interpolating numeric values into the text
:::

```{julia}
#| echo: false
Markdown.parse(
"""
The distribution of the estimates of `β₁` is more-or-less a Gaussian (or "normal") shape, with a mean value of $(@sprintf("%f", mean(βdf.value))) which is close to the estimated `β₁` of $(@sprintf("%f", only(dsm01.β))).
let context = (:compact => true)
Markdown.parse(
"""
The distribution of the estimates of `β₁` is more-or-less a Gaussian (or "normal") shape, with a mean value of $(repr(mean(dsm01stbl.β1); context)), which is close to the estimated `β₁` of $(repr(only(dsm01.β); context)).

Similarly the standard deviation of the simulated β values, $(std(βdf.value)) is close to the standard error of the parameter, $(only(stderror(dsm01))).
""",
)
Similarly the standard deviation of the simulated β values, $(repr(std(dsm01stbl.β1); context)), is close to the standard error of the parameter, $(repr(only(stderror(dsm01)); context)).
""",
)
end
```

In other words, the estimator of the fixed-effects parameter in this case behaves as we would expect.
Expand All @@ -634,12 +619,12 @@ The situation is different for the estimates of the standard deviation parameter
#| fig-cap: Kernel density plot of bootstrap variance-component parameter estimates from model dsm01
#| label: fig-dsm01_bs_sigma_density
draw(
data(@subset(dsm01pars, :type == "σ")) *
data(dsm01stbl) *
mapping(
:value => "Bootstrap samples of σ";
color=(:group => "Group"),
[:σ, :σ1] .=> "Bootstrap samples of standard deviations",
) *
AlgebraOfGraphics.density();
mapping(color=dims(1) => renamer(["σ", "σ₁"])) *
density();
figure=(; resolution=(800, 450)),
)
```
Expand All @@ -648,11 +633,13 @@ The estimator for the residual standard deviation, $\sigma$, is approximately no

```{julia}
#| echo: false
Markdown.parse(
"""
There is one peak around the "true" value for the simulation, $(only(dsm01.σs.batch)), and another peak at zero.
""",
)
let context = :compact => true
Markdown.parse(
"""
There is one peak around the "true" value for the simulation, $(repr(only(dsm01.σs.batch); context)), and another peak at zero.
""",
)
end
```

The apparent distribution of the estimates of $\sigma_1$ in @fig-dsm01_bs_sigma_density is being distorted by the method of approximating the density.
Expand All @@ -671,11 +658,11 @@ Use a lower alpha in the colors for multiple histograms so the bars behind anoth
#| fig-cap: Histogram of bootstrap variance-components as standard deviations from model dsm01
#| label: fig-dsm01_bs_sigma_hist
draw(
data(@subset(dsm01pars, :type == "σ")) *
data(dsm01stbl) *
mapping(
:value => "Bootstrap parameter estimates of σ";
color=(:group => "Group"),
[:σ, :σ1] .=> "Bootstrap samples of standard deviations",
) *
mapping(color=dims(1) => renamer(["σ", "σ₁"])) *
AlgebraOfGraphics.histogram(; bins=80);
figure=(; resolution=(800, 450)),
)
Expand All @@ -698,19 +685,47 @@ In many cases standard errors are quoted for estimates of the variance component
#| code-fold: true
#| fig-cap: Histogram of bootstrap variance-components from model dsm01
#| label: fig-dsm01_bs_sigmasqr_hist
draw(
data(@transform(@subset(dsm01pars, :type == "σ"), abs2(:value))) *
mapping(
:value_abs2 => "Bootstrap sample of estimates of σ²",
color=:group,
) *
AlgebraOfGraphics.histogram(; bins=200);
figure=(; resolution=(800, 450)),
)
let σs = ["σ", "σ1"]
draw(
data(dsm01stbl) *
mapping(
σs .=> abs2 .=> "Bootstrap samples of variance components",
color=dims(1) => renamer(σs),
) *
AlgebraOfGraphics.histogram(; bins=200);
figure=(; resolution=(800, 450)),
)
end
```

The approach of creating coverage intervals from a bootstrap sample of parameter estimates, described in the next section, does accomodate non-normal distributions.

## Quantile-quantile plots and profile-based confidence intervals

Because the distribution of the bootstrap replications of $\sigma_1$ is a hybrid distribution --- that is, a continuous distribution of positive values with a non-zero probability of exactly zero --- neither a kernel density plot (@fig-dsm01_bs_sigma_density) nor a histogram (@fig-dsm01_bs_sigma_hist) is an ideal depiction of it.

A *quantile-quantile* or [Q-Q plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot) provides an alternative representation of the distribution in which the "pulse" at zero can be easily discerned.
If we choose to plot the quantiles of the standard normal (or Gaussian) distribution versus the sample quantiles, as in @fig-dsm01_bs_beta_qq, we have a visual assessment of the extent to which the distribution deviates from a normal distribution.
Such a plot is also called a [normal probability plot](https://en.wikipedia.org/wiki/Normal_probability_plot).

```{julia}
#| code-fold: true
#| fig-cap: Normal probability plot of bootstrap fixed-effects coefficients from model dsm01
#| label: fig-dsm01_bs_beta_qq
let f = Figure(; size=(1000,400)),
pars = [:β1 => "β", :σ => "σ", :σ1 => "σ₁"],
ppts = inv(512):inv(256):1.0,
z = quantile(Normal(), ppts)
map(enumerate(pars)) do (j, p)
ax = Axis(f[1, j]; xlabel=last(p), aspect=1)
lines!(quantile(getproperty(dsm01stbl, first(p)), ppts), z)
end
f
end
```



### Confidence intervals on the parameters {#sec-confints}

A bootstrap sample of parameter estimates also provides us with a method of creating bootstrap coverage intervals, which can be regarded as confidence intervals on the parameters.
Expand All @@ -723,7 +738,7 @@ For example, from the sorted parameter values we could return the interval from
The `shortestcovint` method returns the shortest of all these potential intervals which will correspond to the interval with the highest empirical density.

```{julia}
DataFrame(shortestcovint(dsm01samp))
Table(shortestcovint(dsm01samp))
```

For the `(Intercept)` fixed-effects parameter the interval is similar to an "asymptotic" or Wald interval constructed as the estimate plus or minus two standard errors.
Expand All @@ -750,11 +765,13 @@ DataFrame(dsm01trace.optsum.fitlog)

```{julia}
#| echo: false
Markdown.parse(
"""
Here the algorithm converges after 18 function evaluations to a profiled deviance of $(dsm01trace.objective) at θ = $(only(dsm01trace.theta)).
""",
)
let context = :compact => true
Markdown.parse(
"""
Here the algorithm converges after 18 function evaluations to a profiled deviance of $(repr(dsm01trace.objective; context)) at θ = $(repr(only(dsm01trace.theta); context)).
""",
)
end
```

## Assessing the random effects {#sec-assessRE}
Expand Down
Loading