You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the psychometrics literature the data is often derived from a strategically constructed survey aimed at a particular target phenomena. Some intuited, but not yet measured, concept that arguably plays a role in human action, motivation or sentiment. The relative “fuzziness” of the subject matter in psychometrics has had a catalyzing effect on the methodological rigour sought in the science. Survey designs are agonized over for correct tone and rhythm of sentence structure. Measurement scales are doubly checked for reliability and correctness. The literature is consulted and questions are refined. Analysis steps are justified and tested under a wealth of modelling routines.
25
25
26
-
Model architectures are defined and refined to better express the hypothesized structures in the data-generating process. We will see how such due diligence leads to powerful and expressive models that grant us tractability on thorny questions of human affect.
26
+
Model architectures are defined and refined to better express the hypothesized structures in the data-generating process. We will see how such due diligence leads to powerful and expressive models that grant us tractability on thorny questions of human affect. We draw on Roy Levy and Robert J. Mislevy's _Bayesian Psychometric Modeling_.
ax.set_title("Sample Covariances between indicator Metrics");
66
70
```
67
71
68
-
Next we'll plot the pairplot to visualise the nature of the correlations
72
+
The lens here on the sample covariance matrix is common in the traditional SEM models are often fit to the data by optimising a fit to the covariance matrix. Model assessment routines often gauge the model's ability to recover the sample covariance relations. There is a slightyly different approach taken in the Bayesian approach to estimating these models which focuses on the observed data rather than the derived summary statistics. Next we'll plot the pairplot to visualise the nature of the correlations
The goal is to articulate the relationship between the different factors in terms of the covariances between these latent terms and estimate the relationships each latent factor has with the manifest indicator variables. At a high level, we're saying the joint distribution can be represented through conditionalisation in the following schema
95
+
The goal is to articulate the relationship between the different factors in terms of the covariances between these latent terms and estimate the relationships each latent factor has with the manifest indicator variables. At a high level, we're saying the joint distribution of the observed data can be represented through conditionalisation in the following schema
We will show how to build these structures into our model below
99
+
This is the Bayesian approach to the estimation of CFA and SEM models. We're seeking a conditionalisation structure that can retrodict the observed data based on latent constructs and hypothetical relationships among the constructs and the observed data points. We will show how to build these structures into our model below
96
100
97
101
```{code-cell} ipython3
98
102
# Set up coordinates for appropriate indexing of latent factors
@@ -158,6 +162,10 @@ with pm.Model(coords=coords) as model:
158
162
pm.model_to_graphviz(model)
159
163
```
160
164
165
+
Here the model structure and dependency graph becomes a little clearer. Our likelihood term models a outcome matrix of 283x6 observations i.e. the survey responses for 6 questions. These survey responses are modelled as draws from a multivariate normal $Ksi$ with a prior correlation structure between the latent constructs. We then specify how each of the outcome measures is a function of one of the latent factor modified by the appropriate factor loading `lambda`.
166
+
167
+
+++
168
+
161
169
### Meausurement Model Structure
162
170
163
171
We can now see how the covariance structure among the latent constructs is integral piece of the overarching model design which is fed forward into our pseudo regression components and weighted with the respective lambda terms.
One thing to highlight in particular about the Bayesian manner of fitting CFA and SEM models is that we now have access to the posterior distribution of the latent quantities. These samples can offer insight into particular individuals in our survey that is harder to glean from the multivariate presentation of the manifest variables.
184
192
185
193
```{code-cell} ipython3
194
+
:tags: [hide-input]
195
+
186
196
fig, axs = plt.subplots(1, 2, figsize=(20, 9))
187
197
axs = axs.flatten()
188
198
ax1 = axs[0]
@@ -342,8 +352,6 @@ with pm.Model(coords=coords) as model:
Again our model samples well but the parameter estimates suggest that there is some inconsistency between the scale on which we're trying to force both sets of metrics.
@@ -367,7 +375,7 @@ This hints at a variety of measurement model misspecification and should force u
367
375
368
376
## Full Measurement Model
369
377
370
-
With this in mind we'll now specify a full measurement that maps each of our thematically similar indicator metrics to an indicidual latent construct. This mandates the postulation of 5 distinct constructs.
378
+
With this in mind we'll now specify a full measurement that maps each of our thematically similar indicator metrics to an indicidual latent construct. This mandates the postulation of 5 distinct constructs where we only admit three metrics load on each construct. The choice of which metric loads on the latent construct is determined in our case by the constructs each measure is intended to measure.
371
379
372
380
```{code-cell} ipython3
373
381
drivers = [
@@ -463,13 +471,19 @@ with pm.Model(coords=coords) as model:
We can also pull out the more typical patterns of model evaluation by assessing the fit between the posterior predicted covariances and the sample covariances. This is a sanity check to assess local model fit statistics. The below code iterates over draws from the posterior predictive distribution and calculates the covariance or correlation matrix on eah draw, we calculate the residuals on each draw between each of the covariances and then average across the draws.
ax.set_title("Residuals between Model Implied and Sample Covariances", fontsize=25);
513
529
```
514
530
531
+
But we can also do more contemporary Bayesian posterior predictive checks as we pull out the predictive posterior distribution for each of the observed metrics.
We're not just interested in recovering the observed data patterns we also want a way of pulling out the inferences relating to the latent constructs. For instance we can pull out the factor loadings and calculate measures of variance accounted for by each of the indicator variables in this factor system and for the factors themselves.
The goal of this kind of view isn't necessarily to find useful features as in the machine learning context, but to help understand the nature of the variation in our system. We can also pull out covariances and correlations among the latent factors
Which highlights the strong relationships between life-satisfaction `LS` construct, parental support `SUP_P` and social self-efficacy `SE_SOCIAL`. We can observe these patterns in the draws of our latent constructs too
ax.set_title("Individual Parental Support Metric \n On Latent Factor SUP_P")
648
+
ax1.set_title("Individual Social Self Efficacy \n On Latent Factor SE_SOCIAL")
649
+
ax2.set_title("Individual Life Satisfaction Metric \n On Latent Factor LS")
650
+
plt.show();
651
+
```
652
+
574
653
## Bayesian Structural Equation Models
575
654
655
+
We've now seen how measurement models help us understand the relationships between disparate indicator variables in a kind of crude way. We have postulated a system of latent factors and derive the correlations between these factors to help us understand the strength of relationships between the broader constructs of interest. But this is kind a special case of a structural equation models. In the SEM tradition we're interested in figuring out aspects of the structural relations between variables that means want to posit dependence and independence relationship to interrogate our beliefs about influence flows through the system. For our data set we can postulate the following chain of dependencies
This model introduces the specific claims of dependence and the question then becomes how to model these patterns? In the next section we'll build on the structures of the basic measurement model to articulate these chain of dependence as functional equations of the "root" constructs. This allows to evaluate the same questions of model adequacy as before, but additionally we can now phrase questions about direct and indirect relationships between the latent constructs. In particular, since our focus is on what drives life-satisfaction, we can ask about the mediated effects of parental support.
660
+
661
+
### Model Complexity and Bayesian Sensitivity Analysis
662
+
663
+
These models are complicated, we're adding a bunch of new parameters and structure to the model. Each of the parameters is equipped with a prior that shapes the implications of the model specification. This is hugely expressive framework where we can encode a huge variety of dependencies and correlations With this freedom to structure of inferential model we need to be careful to assess the robustness of our inferences. As such we will here perform a quick sensitivity analysis to show how the central implications of this model vary under prior settings.
The main structural feature to observe is that we've now added a bunch of regressions to our model such that some of the constructs that we took as given in the measurement model are now derived as a linear combination of others. Because we removed the correlation effect between `SE_SOCIAL` AND `SE_ACAD` we re-introduce the possibility of their correlation by adding correlated error terms to their regression equations.
826
+
827
+
```{code-cell} ipython3
828
+
pm.model_to_graphviz(model_sem0)
829
+
```
830
+
831
+
Next we'll see how the parameter estimates change across our prior specifications for the model
832
+
735
833
```{code-cell} ipython3
736
-
fig, ax = plt.subplots(figsize=(10, 15))
834
+
fig, ax = plt.subplots(figsize=(15, 15))
737
835
az.plot_forest(
738
836
[idata_sem0, idata_sem1, idata_sem2],
739
837
model_names=["SEM0", "SEM1", "SEM2"],
@@ -743,6 +841,10 @@ az.plot_forest(
743
841
);
744
842
```
745
843
844
+
### Model Evaluation Checks
845
+
846
+
A quick evaluation of model performance suggests we do somewhat less well in recovering the sample covariance structures than we did with simpler measurement model
0 commit comments