You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/case_studies/bayesian_sem_workflow.ipynb
+43-4Lines changed: 43 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -27,6 +27,10 @@
27
27
"A secondary motivation is to put SEM modelling with PyMC on firmer ground by detailing different sampling strategies for these complex models; we will cover both conditional and marginal formulations of a SEM model, allowing for the addition of mean-structures and hierarchical effects. These additional components highlight the expressive capacity of this modelling paradigm. \n",
28
28
"\n",
29
29
"### The Bayesian Workflow\n",
30
+
"Recall the stages of the Bayesian workflow.\n",
31
+
"\n",
32
+
":::{admonition} The Bayesian Workflow Stages\n",
33
+
":class: tip\n",
30
34
"\n",
31
35
"- **Conceptual model building**: Translate domain knowledge into statistical assumptions\n",
32
36
"- **Prior predictive simulation**: Check if priors generate reasonable data\n",
@@ -36,6 +40,9 @@
36
40
"- **Model comparison**: Compare alternative models systematically\n",
37
41
"- **Model expansion or simplification**: Iterate based on findings\n",
38
42
"- **Decision analysis**: Use the model for predictions or decisions\n",
43
+
":::\n",
44
+
"\n",
45
+
"The Structural equation modelling workflow is similar. \n",
39
46
"\n",
40
47
"### The SEM Workflow\n",
41
48
"- __Confirm the Factor Structure__ (CFA):\n",
@@ -397,6 +404,14 @@
397
404
")"
398
405
]
399
406
},
407
+
{
408
+
"cell_type": "markdown",
409
+
"id": "b2a81043",
410
+
"metadata": {},
411
+
"source": [
412
+
"Conveniently, the process of the Bayesian workflow itself involves the constructive thought strategies. At each juncture in model development we must ask ourselves: do i believe this? What assumptions have I made? Is there any visual evidence that my model is well specified? What can i do to improve the model specification? So we might hope that the end result of the Bayesian workflow is a general sense of satisfaction with a job well done!"
413
+
]
414
+
},
400
415
{
401
416
"cell_type": "markdown",
402
417
"id": "3690f464",
@@ -471,7 +486,13 @@
471
486
"id": "78194165",
472
487
"metadata": {},
473
488
"source": [
474
-
"## Confirmatory Factor Analysis\n"
489
+
"## Setting up Utility Functions\n",
490
+
"\n",
491
+
"For this exercise we will lean on a range of utility functions to build and compare the expansionary sequence. This functions include repeated steps that will be required for any SEM model. \n",
492
+
"\n",
493
+
"The most important cases are functions like `make_lambda` to sample and fix the scale of the covariates that contribute to each latent factor. Similarly, we have the `make_B` which samples the parameter values of the path coefficients between the latent constructs, while arranging them in a matrix that can be passed through matrix multiplication routines. Additionally, we have a `make_Psi` function which samples parameter values for particular covariances that gets deployed to encode aspects of the variance in our system not captured by the covariances among the latent factors. These three helper functions determine the structure of the SEM model and variants of each can be used to construct any SEM structure.\n",
494
+
"\n",
495
+
"We also save some plotting functions which will be used to compare models. "
475
496
]
476
497
},
477
498
{
@@ -647,7 +668,13 @@
647
668
"id": "2daa985f",
648
669
"metadata": {},
649
670
"source": [
650
-
"## CFA v1"
671
+
"## Confirming Factor Structure\n",
672
+
"\n",
673
+
"First we'll highlight the broad structure of a confirmatory factor model and the types of relations the model encodes. The red dotted arrows here denote covariance relationships among the latent factors. The black arrows denote the effect of the latent constructs on the observable indicator metrics. We've highlighted with red [1] that the first \"factor loading\" is always fixed to (a) define the scale of the factor and (b) allow identification of the other factor loadings within that factor. \n",
674
+
"\n",
675
+
"\n",
676
+
"\n",
677
+
"In the model below we sample draws from the latent factors `eta` and relate them to the observables by the matrix computation `pt.dot(eta, Lambda.T)`. This computation reults in a \"psuedo-observation\" matrix which we then feed through our likelihood to calibrate the latent structures against the observed dats. This is the general pattern we'll see in all models below. "
651
678
]
652
679
},
653
680
{
@@ -970,6 +997,14 @@
970
997
"pm.model_to_graphviz(cfa_model_v1)"
971
998
]
972
999
},
1000
+
{
1001
+
"cell_type": "markdown",
1002
+
"id": "8ff5106f",
1003
+
"metadata": {},
1004
+
"source": [
1005
+
"The model diagram should emphasise how the sampling of the latent structure is fed-forward into the ultimate likelihood term. Note here how our likelihood term is specified as a independent Normals. This is a substantive assumption which is later revised. In a full SEM specification we will change the likelihood to use Multivariate normal distribution with specific covariance structures. "
1006
+
]
1007
+
},
973
1008
{
974
1009
"cell_type": "code",
975
1010
"execution_count": 7,
@@ -1508,7 +1543,9 @@
1508
1543
"id": "9f5a5f36",
1509
1544
"metadata": {},
1510
1545
"source": [
1511
-
"### Model Diagnostics and Assessment"
1546
+
"### Model Diagnostics and Assessment\n",
1547
+
"\n",
1548
+
"For each latent variable (satisfaction, well being, constructive, dysfunctional), we will plot a forest/ridge plot of the posterior distributions of their factor scores `eta` as drawn. Each panel will have a vertical reference line at 0 (since latent scores are typically centered/scaled).These panels visualize the distribution of estimated latent scores across individuals, separated by latent factor. Then we will summarizes posterior estimates of model parameters (factor loadings, regression coefficients, variances, etc.), providing a quick check against identification constraints (like fixed loadings) and effect directions. Finally we will plot the upper-triangle of the residual correlation matrix with a blue–white–red colormap (−1 to +1). This visualizes residual correlations among observed indicators after the SEM structure is accounted for — helping detect model misfit or unexplained associations."
Copy file name to clipboardExpand all lines: examples/case_studies/bayesian_sem_workflow.myst.md
+32-3Lines changed: 32 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,10 @@ This is case study builds on themes of {ref}`contemporary Bayesian workflow <bay
26
26
A secondary motivation is to put SEM modelling with PyMC on firmer ground by detailing different sampling strategies for these complex models; we will cover both conditional and marginal formulations of a SEM model, allowing for the addition of mean-structures and hierarchical effects. These additional components highlight the expressive capacity of this modelling paradigm.
27
27
28
28
### The Bayesian Workflow
29
+
Recall the stages of the Bayesian workflow.
30
+
31
+
:::{admonition} The Bayesian Workflow Stages
32
+
:class: tip
29
33
30
34
-**Conceptual model building**: Translate domain knowledge into statistical assumptions
31
35
-**Prior predictive simulation**: Check if priors generate reasonable data
@@ -35,6 +39,9 @@ A secondary motivation is to put SEM modelling with PyMC on firmer ground by det
35
39
-**Model comparison**: Compare alternative models systematically
36
40
-**Model expansion or simplification**: Iterate based on findings
37
41
-**Decision analysis**: Use the model for predictions or decisions
42
+
:::
43
+
44
+
The Structural equation modelling workflow is similar.
Conveniently, the process of the Bayesian workflow itself involves the constructive thought strategies. At each juncture in model development we must ask ourselves: do i believe this? What assumptions have I made? Is there any visual evidence that my model is well specified? What can i do to improve the model specification? So we might hope that the end result of the Bayesian workflow is a general sense of satisfaction with a job well done!
255
+
256
+
+++
257
+
247
258
## Mathematical Interlude
248
259
249
260
In the general set up of a Structural Equation Model 3e have observed variables $y \in R^{p}$, here (p=12) and $\eta \in R^{m}$ latent factors. The SEM consists of two parts the measurement model and the structural regressions. The Measurement Model - this is the factor structure we seek to _confirm_. In this kind of factor analysis we posit a factor structure of how each factor determines the observed metrics.
@@ -309,7 +320,13 @@ We'll introduce each of these components are additional steps as we layer over t
309
320
310
321
+++
311
322
312
-
## Confirmatory Factor Analysis
323
+
## Setting up Utility Functions
324
+
325
+
For this exercise we will lean on a range of utility functions to build and compare the expansionary sequence. This functions include repeated steps that will be required for any SEM model.
326
+
327
+
The most important cases are functions like `make_lambda` to sample and fix the scale of the covariates that contribute to each latent factor. Similarly, we have the `make_B` which samples the parameter values of the path coefficients between the latent constructs, while arranging them in a matrix that can be passed through matrix multiplication routines. Additionally, we have a `make_Psi` function which samples parameter values for particular covariances that gets deployed to encode aspects of the variance in our system not captured by the covariances among the latent factors. These three helper functions determine the structure of the SEM model and variants of each can be used to construct any SEM structure.
328
+
329
+
We also save some plotting functions which will be used to compare models.
First we'll highlight the broad structure of a confirmatory factor model and the types of relations the model encodes. The red dotted arrows here denote covariance relationships among the latent factors. The black arrows denote the effect of the latent constructs on the observable indicator metrics. We've highlighted with red [1] that the first "factor loading" is always fixed to (a) define the scale of the factor and (b) allow identification of the other factor loadings within that factor.
488
+
489
+

490
+
491
+
In the model below we sample draws from the latent factors `eta` and relate them to the observables by the matrix computation `pt.dot(eta, Lambda.T)`. This computation reults in a "psuedo-observation" matrix which we then feed through our likelihood to calibrate the latent structures against the observed dats. This is the general pattern we'll see in all models below.
469
492
470
493
```{code-cell} ipython3
471
494
with pm.Model(coords=coords) as cfa_model_v1:
@@ -497,6 +520,8 @@ with pm.Model(coords=coords) as cfa_model_v1:
497
520
pm.model_to_graphviz(cfa_model_v1)
498
521
```
499
522
523
+
The model diagram should emphasise how the sampling of the latent structure is fed-forward into the ultimate likelihood term. Note here how our likelihood term is specified as a independent Normals. This is a substantive assumption which is later revised. In a full SEM specification we will change the likelihood to use Multivariate normal distribution with specific covariance structures.
For each latent variable (satisfaction, well being, constructive, dysfunctional), we will plot a forest/ridge plot of the posterior distributions of their factor scores `eta` as drawn. Each panel will have a vertical reference line at 0 (since latent scores are typically centered/scaled).These panels visualize the distribution of estimated latent scores across individuals, separated by latent factor. Then we will summarizes posterior estimates of model parameters (factor loadings, regression coefficients, variances, etc.), providing a quick check against identification constraints (like fixed loadings) and effect directions. Finally we will plot the upper-triangle of the residual correlation matrix with a blue–white–red colormap (−1 to +1). This visualizes residual correlations among observed indicators after the SEM structure is accounted for — helping detect model misfit or unexplained associations.
0 commit comments