@@ -48,77 +48,76 @@ All commands should be run from the directory `stan-dev/example-models/knitr/car
4848
4949## About conditional autoregressive models
5050
51- CAR models are used for areal data consisting of a single aggregated measure
52- per areal unit, which may be a binary, count, or continuous value.
51+ CAR and ICAR models are used when areal data consists
52+ of a single aggregated measure per areal unit,
53+ either a binary, count, or continuous value.
5354Areal units are volumes, more precisely,
5455areal units partition a multi-dimensional volume D into a finite number
5556of sub-volumes with well-defined boundaries.
5657Areal data differs from point data,
5758which consists of measurements from a known set of geo-spatial points.
5859For point data, the relationship between points is a
59- continuous, real-valued distance measure.
60- For areal units, the relationship between areal units is characterized in
61- terms of adjacency.
62-
63- Given a set of observations taken at $n$ different areal units of a region,
60+ continuous, real-valued distance measure which can be calculated
61+ automatically for any two points on the map,
62+ allowing for the addition of new points to the map.
63+ Given a set of areal units, there is no automatic procedure
64+ for adding a new areal unit, thus models for areal data are
65+ non-generative with respect to the areal regions.
66+
67+ For a set of $N$ areal units, the relationship between areal units
68+ is described by an $N \times N$ adjacency matrix,
69+ which is usually written $A$ for adjacency, or $W$ for weights.
70+ For the binary _ neighbor_ relationship,
71+ written $i \sim j$ where $i \neq j$, the entries in the adjacency matrix
72+ are $1$ if regions $n_i$ and $n_j$ are neighbors and is otherwise $0$.
73+ For CAR models, the neighbor relationship is symmetric but not reflexive;
74+ if $i \sim j$ then $j \sim i$, but a region is not its own neighbor.
75+
76+ ### Conditional Autoregressive (CAR) Models
77+
78+ Given a set of observations taken at $N$ different areal units of a region,
6479spatial interactions between a pair of units $n_i$ and $n_j$ can be modelled conditionally
6580as a spatial random variable $\mathbf{\phi}$, which is an $n$-length vector
6681$\mathbf{\phi} = ({\phi}_ 1, \ldots, {\phi}_ n)^T$.
6782
68- For CAR models, spatial relationship between the $n$ areal units
69- are represented as an adjacency matrix $W$ with dimensions $n \times n$
70- where entries $w_ {i,j}$ and $w_ {j,i}$ are positive when regions $i$ and $j$ are neighbors
71- and zero otherwise.
72- The _ neighbor_ relationship $i \sim j$ is defined in terms of this matrix:
73- the neighbors of region $i$ are those regions who have non-zero entries in row or column $i$.
74- This encoding defines a lattice structure over the $n$ areal units.
75-
76-
77- ### Conditional Auto-Regressive (CAR) Models
78-
79- Besag (1974) motivates CAR models for spatial processes using
80- results from the physics of lattice systems of particles and
81- the Hammersley-Clifford theorem which provides an equivalence between
82- a local specification of the conditional distribution of each particle
83- given its neighboring particles and the global specification
84- of the joint distribution of all particles.
85- This specification of the joint distribution via the local specification
86- of the conditional distributions of the individual variables
87- is a Markov random field (MRF) specification.
88- The conditional distribution for each ${\phi}_ i$ is specified in terms of a mean
89- and precision parameter $\tau$ as:
90-
91- $$ p \left( { \phi }_i \, \vert\, {\phi}_j \, j \neq i, {{\tau}_i}^{-1} \right)
92- = \mathit{N} \left( \alpha \sum_{i \sim j} {w}_{i,j} {\phi}_j,\tau_i^{-1} \right), i,j = 1, \ldots, n $$
93-
94- The parameter $\alpha$ controls the strength of the spatial association,
95- where $\alpha = 0$ corresponds to spatial independence.
96-
97- The corresponding joint distribution can be uniquely determined from
98- the set of full conditional distributions by
99- introducing a fixed point from the support of $p$
100- and then using Brook’s Lemma to factor the set of conditional distributions
101- into a joint distribution which is determined up to a proportionality constant
102- (see Banerjee, Carlin, and Gelfand, 2004, sec. 3.2) as a Gaussian with mean and precision parameters:
103-
104- $$ \mathbf{\phi} \sim \mathit{N} \left(\mathbf{0}, \left[D_{\tau}(I - \alpha B)\right]^{-1} \right) $$
83+ In the full conditional distribution, each ${\phi}_ i$ is conditional
84+ on the sum of the weighted values of its neighbors ($w_ {ij}\, \phi_j$)
85+ and has unknown variance
86+ $$ \phi_i \mid \phi_j, j \neq i, \sim \mathrm{N} \left( \sum_{j = 1}^n w_{ij} \phi_j, {\sigma}^2 \right). $$
87+ \
88+ Specification of the global, or joint distribution via the local specification
89+ of the conditional distributions of the individual random variables
90+ defines a Gaussian Markov random field (GMRF).
91+ Besag (1974) proved that the
92+ corresponding joint specification of $\phi$ is a multivariate normal random variable
93+ centered at $0$.
94+ The variance of $\phi$ is specified as a precision matrix $Q$ which
95+ is simply the inverse of the covariance matrix $\Sigma$, i.e. $\Sigma = Q^{-1}$
96+ so that
97+ $$ \phi \sim \mathrm{N}(0, Q^{-1}). $$
98+
99+ In order for the standard multivariate normal random variable $\phi$ to have a proper joint probability density,
100+ the precision matrix $Q$ must be symmetric and positive definite.
101+ This is accomplished by constructing the precision matrix $Q$ from the adjacency matrix $W$:
102+
103+ $$ Q = [D_{\tau}(I - \alpha B)] $$
105104
106105where
107106
108107- $W$ is the $n \times n$ adjacency matrix where entries $\{ i,i\} $ are zero and the off-diagonal elements
109108are $1$ if regions $i$ and $j$ are neighbors and $0$ otherwise.
110- - $D$ is the $n \times n$ diagonal where entries $\{ i,i\} $ are the number of neighbors of region $i$
111- and the off-diagnoal entries are $0$.
112- - $D_ {\tau} = \tau D$
113- - $\alpha$ is between 0 and 1
114- - $B$ is the scaled adjacency matrix $D^{-1}W$
115- - $I $ is an $n \times n$ identity matrix
116-
117- Since $D _ {\tau} = \tau D$ and $B = D^{-1}W$, $ [ D _ {\tau}(I - \alpha B) ] ^{-1}$
118- rewrites to $ [ {\tau}(D - \alpha W) ] ^{-1}$.
119- In the case where $\alpha < 0$, the precision matrix is positive definite,
120- thus the joint distribution is proper.
121- However evaluation of the joint distribution requires computing the determinant of this matrix,
109+ - $D$ is the $n \times n$ diagonal matrix where entries $\{ i,i\} $ are the number of neighbors of region $i$
110+ and the off-diagonal entries are $0$.
111+ - $D_ {\tau} = \tau\, D$.
112+ - $\alpha$ a parameter which controls the amount of spatial correlation, where
113+ where $\alpha = 0$ implies spatial independence and where $\alpha = 1$ implies complete spatial correlation.
114+ - $B $ is the scaled adjacency matrix $D^{-1}W$.
115+ - $I$ is an $n \times n$ identity matrix.
116+
117+ When $\alpha$ is in the interval (0,1), the precision matrix $Q$ is positive definite,
118+ thus the joint distribution $\phi$ is proper.
119+
120+ Evaluation of $\phi$ requires computing the determinant of the precision matrix $Q$ ,
122121which is computationally expensive.
123122See the Stan case study
124123[ Exact sparse CAR models in Stan] ( http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html ) ,
@@ -127,11 +126,24 @@ for discussion of how to speed up computation.
127126### Intrinsic Conditional Auto-Regressive (ICAR) models
128127
129128An Intrinsic Conditional Auto-Regressive (ICAR) model is a CAR model where $\alpha = 1$,
130- so that the joint distribution simplifies to,
129+ that is, it assumes complete spatial correlation between regions.
130+ (Spoiler alert: this assumption is problematic, resulting in the
131+ the BYM model and successors).
132+ The joint distribution of the ICAR model is derived from the joint distribution
133+ for the CAR model as follows:
134+
135+ - since $D_ {\tau} = \tau D$ and $B = D^{-1}W$, the expression $[ D_ {\tau}(I - \alpha B)] $
136+ simplifies to $[ {\tau}(D - \alpha W)] $.
137+ - since $\alpha = 1$, it is omitted.
138+
139+ The resulting matrix $[ \tau \, (D - W)] $ is singular, thus the ICAR variate $\phi$
140+ is an improper prior distribution, with joint distribution:
131141
132142$$ \phi \sim N(0, [\tau \, (D - W)]^{-1}). $$
133143
134- resulting in a singular precision matrix and an improper prior distribution.
144+ While this ICAR model is non-generating in that it cannot be used as a model for the data,
145+ it can be used as a prior as part of a hierarchical model, which is the role it plays in
146+ the BYM model.
135147
136148The corresponding conditional distribution specification is:
137149
@@ -152,16 +164,10 @@ where $NC$ is the number of components in the graph over all areal subregions de
152164$NC == 1$ when the areal graph is fully connected, i.e., every subregion can be reached from every other subregion
153165via a sequence of neighbors.
154166
155- The above conditions for the ICAR model produce an improper distribution
156- because setting $\alpha = 1$ creates a singular matrix $(D - W)$, see Besag and Kooperberg 1995.
157- Furthermore, the joint distribution is non-identifiable;
167+ From the pairwise difference formulation, we see that the joint distribution is non-identifiable;
158168adding any constant to all of the elements of $\phi$ leaves the joint distribution unchanged.
159169Adding the constraint $\sum_ {i} {\phi}_ i = 0$ resolves this problem.
160170
161- While this ICAR model is non-generating in that it cannot be used as a model for the data,
162- it can be used as a prior as part of a hierarchical model, which is the role it plays in
163- the BYM model.
164-
165171### Derivation of the _ Pairwise Difference_ Formula
166172
167173The jump from the joint distribution to the pairwise difference requires
0 commit comments