explain CAR, ICAR more clearly

mitzimorris · mitzimorris · commit b42db75384b9 · 2021-01-16T15:57:31.000-05:00
diff --git a/knitr/car-iar-poisson/icar_stan.html b/knitr/car-iar-poisson/icar_stan.html
@@ -369,7 +369,7 @@ <h4 class="author">Mitzi Morris</h4>
 
 <div id="TOC">
 <ul>
-<li><a href="#about-conditional-autoregressive-models">About conditional autoregressive models</a></li>
+<li><a href="#auto-regressive-models-for-areal-data">Auto-Regressive models for Areal Data</a></li>
 <li><a href="#adding-an-icar-component-to-a-stan-model">Adding an ICAR component to a Stan model</a></li>
 <li><a href="#example-disease-mapping-using-the-besag-york-mollie-model">Example: disease mapping using the Besag York Mollié model</a></li>
 <li><a href="#bym2-improving-the-parameterization-of-the-besag-york-and-mollie-model">BYM2: improving the parameterization of the Besag, York, and Mollié model</a></li>
@@ -399,49 +399,54 @@ <h4>Update, August 2019</h4>
 <p>When areal data has a spatial structure such that observations from neighboring regions exhibit higher correlation than distant regions, this correlation can be accounted for using the class of spatial models called “CAR” models (Conditional Auto-Regressive) introduced by Besag (Besag 1974). Intrinsic Conditional Auto-Regressive (ICAR) models are a subclass of CAR models. The Besag York Mollié (BYM) model is a lognormal Poisson model which includes both an ICAR component for spatial smoothing and an ordinary random-effects component for non-spatial heterogeneity. This case study covers how to efficiently code these models in Stan.</p>
 <p>All models and data files are available in the Stan example-models GitHub repo for Stan case studies: <a href="https://github.com/stan-dev/example-models/tree/master/knitr/car-iar-poisson">car-iar-poisson</a>. All commands should be run from the directory <code>stan-dev/example-models/knitr/car-iar-poisson</code>.</p>
 </div>
-<div id="about-conditional-autoregressive-models" class="section level2">
-<h2>About conditional autoregressive models</h2>
-<p>CAR models are used for areal data consisting of a single aggregated measure per areal unit, which may be a binary, count, or continuous value. Areal units are volumes, more precisely, areal units partition a multi-dimensional volume D into a finite number of sub-volumes with well-defined boundaries. Areal data differs from point data, which consists of measurements from a known set of geo-spatial points. For point data, the relationship between points is a continuous, real-valued distance measure. For areal units, the relationship between areal units is characterized in terms of adjacency.</p>
-<p>Given a set of observations taken at <span class="math inline">\(n\)</span> different areal units of a region, spatial interactions between a pair of units <span class="math inline">\(n_i\)</span> and <span class="math inline">\(n_j\)</span> can be modelled conditionally as a spatial random variable <span class="math inline">\(\mathbf{\phi}\)</span>, which is an <span class="math inline">\(n\)</span>-length vector <span class="math inline">\(\mathbf{\phi} = ({\phi}_1, \ldots, {\phi}_n)^T\)</span>.</p>
-<p>For CAR models, spatial relationship between the <span class="math inline">\(n\)</span> areal units are represented as an adjacency matrix <span class="math inline">\(W\)</span> with dimensions <span class="math inline">\(n \times n\)</span> where entries <span class="math inline">\(w_{i,j}\)</span> and <span class="math inline">\(w_{j,i}\)</span> are positive when regions <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> are neighbors and zero otherwise. The <em>neighbor</em> relationship <span class="math inline">\(i \sim j\)</span> is defined in terms of this matrix: the neighbors of region <span class="math inline">\(i\)</span> are those regions who have non-zero entries in row or column <span class="math inline">\(i\)</span>. This encoding defines a lattice structure over the <span class="math inline">\(n\)</span> areal units.</p>
-<div id="conditional-auto-regressive-car-models" class="section level3">
-<h3>Conditional Auto-Regressive (CAR) Models</h3>
-<p>Besag (1974) motivates CAR models for spatial processes using results from the physics of lattice systems of particles and the Hammersley-Clifford theorem which provides an equivalence between a local specification of the conditional distribution of each particle given its neighboring particles and the global specification of the joint distribution of all particles. This specification of the joint distribution via the local specification of the conditional distributions of the individual variables is a Markov random field (MRF) specification. The conditional distribution for each <span class="math inline">\({\phi}_i\)</span> is specified in terms of a mean and precision parameter <span class="math inline">\(\tau\)</span> as:</p>
-<p><span class="math display">\[ p \left( { \phi }_i \, \vert\, {\phi}_j \, j \neq i, {{\tau}_i}^{-1} \right)
-= \mathit{N} \left( \alpha \sum_{i \sim j} {w}_{i,j} {\phi}_j,\tau_i^{-1} \right), i,j = 1, \ldots, n \]</span></p>
-<p>The parameter <span class="math inline">\(\alpha\)</span> controls the strength of the spatial association, where <span class="math inline">\(\alpha = 0\)</span> corresponds to spatial independence.</p>
-<p>The corresponding joint distribution can be uniquely determined from the set of full conditional distributions by introducing a fixed point from the support of <span class="math inline">\(p\)</span> and then using Brook’s Lemma to factor the set of conditional distributions into a joint distribution which is determined up to a proportionality constant (see Banerjee, Carlin, and Gelfand, 2004, sec. 3.2):</p>
-<p><span class="math display">\[ \mathbf{\phi} \sim \mathit{N} \left(\mathbf{0}, \left[D_{\tau}(I - \alpha B)\right]^{-1} \right) \]</span></p>
+
+<div id="auto-regressive-models-for-areal-data" class="section level2">
+<h2>Auto-Regressive models for Areal Data</h2>
+<p>CAR and ICAR models are used when areal data consists of a single aggregated measure per areal unit, either a binary, count, or continuous value. Areal units are volumes, more precisely, areal units partition a multi-dimensional volume D into a finite number of sub-volumes with well-defined boundaries. Areal data differs from point data, which consists of measurements from a known set of geo-spatial points. For point data, the relationship between points is a continuous, real-valued distance measure which can be calculated automatically for any two points on the map, allowing for the addition of new points to the map. Given a set of areal units, there is no automatic procedure for adding a new areal unit, thus models for areal data are non-generative with respect to the areal regions.</p>
+<p>For a set of <span class="math inline">\(N\)</span> areal units, the relationship between areal units is described by an <span class="math inline">\(N \times N\)</span> adjacency matrix, which is usually written <span class="math inline">\(A\)</span> for adjacency, or <span class="math inline">\(W\)</span> for weights. For the binary <em>neighbor</em> relationship, written <span class="math inline">\(i \sim j\)</span> where <span class="math inline">\(i \neq j\)</span>, the entries in the adjacency matrix are <span class="math inline">\(1\)</span> if regions <span class="math inline">\(n_i\)</span> and <span class="math inline">\(n_j\)</span> are neighbors and is otherwise <span class="math inline">\(0\)</span>. For CAR models, the neighbor relationship is symmetric but not reflexive; if <span class="math inline">\(i \sim j\)</span> then <span class="math inline">\(j \sim i\)</span>, but a region is not its own neighbor.</p>
+<div id="conditional-autoregressive-car-models" class="section level3">
+<h3>Conditional Autoregressive (CAR) Models</h3>
+<p>Given a set of observations taken at <span class="math inline">\(N\)</span> different areal units of a region, spatial interactions between a pair of units <span class="math inline">\(n_i\)</span> and <span class="math inline">\(n_j\)</span> can be modelled conditionally as a spatial random variable <span class="math inline">\(\mathbf{\phi}\)</span>, which is an <span class="math inline">\(n\)</span>-length vector <span class="math inline">\(\mathbf{\phi} = ({\phi}_1, \ldots, {\phi}_n)^T\)</span>.</p>
+<p>In the full conditional distribution, each <span class="math inline">\({\phi}_i\)</span> is conditional on the sum of the weighted values of its neighbors (<span class="math inline">\(w_{ij}\,\phi_j\)</span>) and has unknown variance <span class="math display">\[\phi_i \mid \phi_j, j \neq i, \sim \mathrm{N} \left( \sum_{j = 1}^n w_{ij} \phi_j, {\sigma}^2 \right).\]</span><br />
+Specification of the global, or joint distribution via the local specification of the conditional distributions of the individual random variables defines a Gaussian Markov random field (GMRF). Besag (1974) proved that the corresponding joint specification of <span class="math inline">\(\phi\)</span> is a multivariate normal random variable centered at <span class="math inline">\(0\)</span>. The variance of <span class="math inline">\(\phi\)</span> is specified as a precision matrix <span class="math inline">\(Q\)</span> which is simply the inverse of the covariance matrix <span class="math inline">\(\Sigma\)</span>, i.e. <span class="math inline">\(\Sigma = Q^{-1}\)</span> so that <span class="math display">\[\phi \sim \mathrm{N}(0, Q^{-1}).\]</span></p>
+<p>In order for the standard multivariate normal random variable <span class="math inline">\(\phi\)</span> to have a proper joint probability density, the precision matrix <span class="math inline">\(Q\)</span> must be symmetric and positive definite. This is accomplished by constructing the precision matrix <span class="math inline">\(Q\)</span> from the adjacency matrix <span class="math inline">\(W\)</span>:</p>
+<p><span class="math display">\[ Q = [D_{\tau}(I - \alpha B)] \]</span></p>
 <p>where</p>
 <ul>
-<li><span class="math inline">\(\alpha\)</span> is between 0 and 1</li>
-<li><span class="math inline">\(B\)</span> is the <span class="math inline">\(n \times n\)</span> matrix weights matrix <span class="math inline">\(W\)</span> where entries <span class="math inline">\(\{i,i\}\)</span> are zero and the off-diagonal elements describe the spatial proximity of regions <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span></li>
-<li><span class="math inline">\(I\)</span> is an <span class="math inline">\(n \times n\)</span> identity matrix</li>
-<li><span class="math inline">\(D_{\tau} = \tau D\)</span> where <span class="math inline">\(D\)</span> is an <span class="math inline">\(n \times n\)</span> diagonal matrix</li>
+<li><span class="math inline">\(W\)</span> is the <span class="math inline">\(n \times n\)</span> adjacency matrix where entries <span class="math inline">\(\{i,i\}\)</span> are zero and the off-diagonal elements are <span class="math inline">\(1\)</span> if regions <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> are neighbors and <span class="math inline">\(0\)</span> otherwise.</li>
+<li><span class="math inline">\(D\)</span> is the <span class="math inline">\(n \times n\)</span> diagonal matrix where entries <span class="math inline">\(\{i,i\}\)</span> are the number of neighbors of region <span class="math inline">\(i\)</span> and the off-diagnoal entries are <span class="math inline">\(0\)</span>.</li>
+<li><span class="math inline">\(D_{\tau} = \tau\, D\)</span>.</li>
+<li><span class="math inline">\(\alpha\)</span> a parameter which controls the amount of spatial correlation, where where <span class="math inline">\(\alpha = 0\)</span> implies spatial independence and where <span class="math inline">\(\alpha = 1\)</span> implies complete spatial correlaton.</li>
+<li><span class="math inline">\(B\)</span> is the scaled adjacency matrix <span class="math inline">\(D^{-1}W\)</span>.</li>
+<li><span class="math inline">\(I\)</span> is an <span class="math inline">\(n \times n\)</span> identity matrix.</li>
 </ul>
-<p>The construction of the spatial proximity matrix <span class="math inline">\(B\)</span> determines the class of CAR model structure.</p>
-<p>In the case where <span class="math inline">\(B\)</span> is a positive definite matrix, then the CAR model structure is a fully generative model. However evaluation of the joint distribution requires computing the covariance matrix described by <span class="math inline">\([D_{\tau}(I - \alpha B)]^{-1}\)</span>, which is computationally expensive. See the Stan case study <a href="http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html">Exact sparse CAR models in Stan</a>, for further discussion of CAR models.</p>
+<p>When <span class="math inline">\(\alpha\)</span> is in the interval (0,1), the precision matrix <span class="math inline">\(Q\)</span> is positive definite, thus the joint distribution <span class="math inline">\(\phi\)</span> is proper.</p>
+<p>Evaluation of <span class="math inline">\(\phi\)</span> requires computing the determinant of the precision matrix <span class="math inline">\(Q\)</span>, which is computationally expensive. See the Stan case study <a href="http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html">Exact sparse CAR models in Stan</a>, for discussion of how to speed up computation.</p>
 </div>
 <div id="intrinsic-conditional-auto-regressive-icar-models" class="section level3">
 <h3>Intrinsic Conditional Auto-Regressive (ICAR) models</h3>
-<p>An Intrinsic Conditional Auto-Regressive (ICAR) model is a CAR model where:</p>
+<p>An Intrinsic Conditional Auto-Regressive (ICAR) model is a CAR model where <span class="math inline">\(\alpha = 1\)</span>, that is, it assumes complete spatial correlation between regions. (Spoiler alert: this assumption is problematic, resulting in the the BYM model and successors). The joint distribution of the ICAR model is derived from the joint distribution for the CAR model as follows:</p>
 <ul>
-<li><span class="math inline">\(\alpha = 1\)</span></li>
-<li><span class="math inline">\(D\)</span> is an <span class="math inline">\(n \times n\)</span> diagonal matrix where <span class="math inline">\(d_{i,i}\)</span> = the number of neighbors for region <span class="math inline">\(n_i\)</span></li>
-<li><span class="math inline">\(B\)</span> is the scaled weights matrix <span class="math inline">\(W / D\)</span>, where <span class="math inline">\(W\)</span> is uses a binary encoding such that <span class="math inline">\(w_{i,i} = 0, w_{i,j} = 1\)</span> if <span class="math inline">\(i\)</span> is a neighbor of <span class="math inline">\(j\)</span>, and <span class="math inline">\(w_{i,j}=0\)</span> otherwise</li>
+<li>since <span class="math inline">\(D_{\tau} = \tau D\)</span> and <span class="math inline">\(B = D^{-1}W\)</span>, the expression <span class="math inline">\([D_{\tau}(I - \alpha B)]\)</span> simplifies to <span class="math inline">\([{\tau}(D - \alpha W)]\)</span>.</li>
+<li>since <span class="math inline">\(\alpha = 1\)</span>, it is omitted.</li>
 </ul>
+<p>The resulting matrix <span class="math inline">\([\tau \, (D - W)]\)</span> is singular, thus the ICAR variate <span class="math inline">\(\phi\)</span> is an improper prior distribution, with joint distribution:</p>
+<p><span class="math display">\[\phi \sim N(0, [\tau \, (D - W)]^{-1}).\]</span></p>
+<p>While this ICAR model is non-generating in that it cannot be used as a model for the data, it can be used as a prior as part of a hierarchical model, which is the role it plays in the BYM model.</p>
 <p>The corresponding conditional distribution specification is:</p>
 <p><span class="math display">\[ p \left( { \phi }_i \, \vert\, {\phi}_j \, j \neq i, {{\tau}_i}^{-1} \right)
 = \mathit{N} \left( \frac{\sum_{i \sim j} {\phi}_{i}}{d_{i,i}}, \frac{1}{d_{i,i} {\tau}_i} \right)\]</span></p>
 <p>where <span class="math inline">\(d_{i,i}\)</span> is the number of neighbors for region <span class="math inline">\(n_i\)</span>. The individual spatial random variable <span class="math inline">\({\phi}_i\)</span> for region <span class="math inline">\(n_i\)</span> which has a set of neighbors <span class="math inline">\(j \neq i\)</span> whose cardinality is <span class="math inline">\(d_{i,i}\)</span>, is normally distributed with a mean equal to the average of its neighbors. Its variance decreases as the number of neighbors increases.</p>
-<p>The joint distribution simplifies to:</p>
-<p><span class="math display">\[\phi \sim N(0, [\tau \, (D - W)]^{-1}).\]</span></p>
-<p>which rewrites to the <em>pairwise difference</em> formulation:</p>
+<p>The joint distribution, above, rewrites to the <em>pairwise difference</em> formulation:</p>
 <p><span class="math display">\[ p(\phi | \tau) \propto {\tau}^{\frac{n - NC}{2}} \exp \left\{ {- \frac{\tau}{2}} \sum_{i \sim j}{({\phi}_i - {\phi}_j)}^2 \right\} \]</span></p>
 <p>where <span class="math inline">\(NC\)</span> is the number of components in the graph over all areal subregions defined by the spatial proximity matrix; <span class="math inline">\(NC == 1\)</span> when the areal graph is fully connected, i.e., every subregion can be reached from every other subregion via a sequence of neighbors.</p>
-<p>The above conditions for the ICAR model produce an improper distribution because setting <span class="math inline">\(\alpha = 1\)</span> creates a singular matrix <span class="math inline">\((D - W)\)</span>, see Besag and Kooperberg 1995. Furthermore, the joint distribution is non-identifiable; adding any constant to all of the elements of <span class="math inline">\(\phi\)</span> leaves the joint distribution unchanged. Adding the constraint <span class="math inline">\(\sum_{i} {\phi}_i = 0\)</span> resolves this problem.</p>
-<p>While this ICAR model is non-generating in that it cannot be used as a model for the data, it can be used as a prior as part of a hierarchical model, which is the role it plays in the BYM model.</p>
+<p>From the pairwise difference formulation, we see that the joint distribution is non-identifiable; adding any constant to all of the elements of <span class="math inline">\(\phi\)</span> leaves the joint distribution unchanged. Adding the constraint <span class="math inline">\(\sum_{i} {\phi}_i = 0\)</span> resolves this problem.</p>
 </div>
+
+
+
+
+
 <div id="derivation-of-the-pairwise-difference-formula" class="section level3">
 <h3>Derivation of the <em>Pairwise Difference</em> Formula</h3>
 <p>The jump from the joint distribution to the pairwise difference requires a little reasoning about the matrix <span class="math inline">\(D - W\)</span> and a lot of algebra, which we present here. As stated above, the notation <span class="math inline">\(i \sim j\)</span> indicates that <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> are neighbors.</p>