Skip to content

Commit a9b5663

Browse files
committed
fadding markdown write up and excalidraw images
Signed-off-by: Nathaniel <[email protected]>
1 parent 5c9bc78 commit a9b5663

File tree

4 files changed

+75
-7
lines changed

4 files changed

+75
-7
lines changed

examples/case_studies/bayesian_sem_workflow.ipynb

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@
2727
"A secondary motivation is to put SEM modelling with PyMC on firmer ground by detailing different sampling strategies for these complex models; we will cover both conditional and marginal formulations of a SEM model, allowing for the addition of mean-structures and hierarchical effects. These additional components highlight the expressive capacity of this modelling paradigm. \n",
2828
"\n",
2929
"### The Bayesian Workflow\n",
30+
"Recall the stages of the Bayesian workflow.\n",
31+
"\n",
32+
":::{admonition} The Bayesian Workflow Stages\n",
33+
":class: tip\n",
3034
"\n",
3135
"- **Conceptual model building**: Translate domain knowledge into statistical assumptions\n",
3236
"- **Prior predictive simulation**: Check if priors generate reasonable data\n",
@@ -36,6 +40,9 @@
3640
"- **Model comparison**: Compare alternative models systematically\n",
3741
"- **Model expansion or simplification**: Iterate based on findings\n",
3842
"- **Decision analysis**: Use the model for predictions or decisions\n",
43+
":::\n",
44+
"\n",
45+
"The Structural equation modelling workflow is similar. \n",
3946
"\n",
4047
"### The SEM Workflow\n",
4148
"- __Confirm the Factor Structure__ (CFA):\n",
@@ -397,6 +404,14 @@
397404
")"
398405
]
399406
},
407+
{
408+
"cell_type": "markdown",
409+
"id": "b2a81043",
410+
"metadata": {},
411+
"source": [
412+
"Conveniently, the process of the Bayesian workflow itself involves the constructive thought strategies. At each juncture in model development we must ask ourselves: do i believe this? What assumptions have I made? Is there any visual evidence that my model is well specified? What can i do to improve the model specification? So we might hope that the end result of the Bayesian workflow is a general sense of satisfaction with a job well done!"
413+
]
414+
},
400415
{
401416
"cell_type": "markdown",
402417
"id": "3690f464",
@@ -471,7 +486,13 @@
471486
"id": "78194165",
472487
"metadata": {},
473488
"source": [
474-
"## Confirmatory Factor Analysis\n"
489+
"## Setting up Utility Functions\n",
490+
"\n",
491+
"For this exercise we will lean on a range of utility functions to build and compare the expansionary sequence. This functions include repeated steps that will be required for any SEM model. \n",
492+
"\n",
493+
"The most important cases are functions like `make_lambda` to sample and fix the scale of the covariates that contribute to each latent factor. Similarly, we have the `make_B` which samples the parameter values of the path coefficients between the latent constructs, while arranging them in a matrix that can be passed through matrix multiplication routines. Additionally, we have a `make_Psi` function which samples parameter values for particular covariances that gets deployed to encode aspects of the variance in our system not captured by the covariances among the latent factors. These three helper functions determine the structure of the SEM model and variants of each can be used to construct any SEM structure.\n",
494+
"\n",
495+
"We also save some plotting functions which will be used to compare models. "
475496
]
476497
},
477498
{
@@ -647,7 +668,13 @@
647668
"id": "2daa985f",
648669
"metadata": {},
649670
"source": [
650-
"## CFA v1"
671+
"## Confirming Factor Structure\n",
672+
"\n",
673+
"First we'll highlight the broad structure of a confirmatory factor model and the types of relations the model encodes. The red dotted arrows here denote covariance relationships among the latent factors. The black arrows denote the effect of the latent constructs on the observable indicator metrics. We've highlighted with red [1] that the first \"factor loading\" is always fixed to (a) define the scale of the factor and (b) allow identification of the other factor loadings within that factor. \n",
674+
"\n",
675+
"![](cfa_excalidraw.png)\n",
676+
"\n",
677+
"In the model below we sample draws from the latent factors `eta` and relate them to the observables by the matrix computation `pt.dot(eta, Lambda.T)`. This computation reults in a \"psuedo-observation\" matrix which we then feed through our likelihood to calibrate the latent structures against the observed dats. This is the general pattern we'll see in all models below. "
651678
]
652679
},
653680
{
@@ -970,6 +997,14 @@
970997
"pm.model_to_graphviz(cfa_model_v1)"
971998
]
972999
},
1000+
{
1001+
"cell_type": "markdown",
1002+
"id": "8ff5106f",
1003+
"metadata": {},
1004+
"source": [
1005+
"The model diagram should emphasise how the sampling of the latent structure is fed-forward into the ultimate likelihood term. Note here how our likelihood term is specified as a independent Normals. This is a substantive assumption which is later revised. In a full SEM specification we will change the likelihood to use Multivariate normal distribution with specific covariance structures. "
1006+
]
1007+
},
9731008
{
9741009
"cell_type": "code",
9751010
"execution_count": 7,
@@ -1508,7 +1543,9 @@
15081543
"id": "9f5a5f36",
15091544
"metadata": {},
15101545
"source": [
1511-
"### Model Diagnostics and Assessment"
1546+
"### Model Diagnostics and Assessment\n",
1547+
"\n",
1548+
"For each latent variable (satisfaction, well being, constructive, dysfunctional), we will plot a forest/ridge plot of the posterior distributions of their factor scores `eta` as drawn. Each panel will have a vertical reference line at 0 (since latent scores are typically centered/scaled).These panels visualize the distribution of estimated latent scores across individuals, separated by latent factor. Then we will summarizes posterior estimates of model parameters (factor loadings, regression coefficients, variances, etc.), providing a quick check against identification constraints (like fixed loadings) and effect directions. Finally we will plot the upper-triangle of the residual correlation matrix with a blue–white–red colormap (−1 to +1). This visualizes residual correlations among observed indicators after the SEM structure is accounted for — helping detect model misfit or unexplained associations."
15121549
]
15131550
},
15141551
{
@@ -1678,7 +1715,9 @@
16781715
"id": "a5538546",
16791716
"metadata": {},
16801717
"source": [
1681-
"## SEM V1 Conditional Formulation"
1718+
"## Structuring the Latent Relations\n",
1719+
"\n",
1720+
"![](sem3_excalidraw.png)"
16821721
]
16831722
},
16841723
{

examples/case_studies/bayesian_sem_workflow.myst.md

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ This is case study builds on themes of {ref}`contemporary Bayesian workflow <bay
2626
A secondary motivation is to put SEM modelling with PyMC on firmer ground by detailing different sampling strategies for these complex models; we will cover both conditional and marginal formulations of a SEM model, allowing for the addition of mean-structures and hierarchical effects. These additional components highlight the expressive capacity of this modelling paradigm.
2727

2828
### The Bayesian Workflow
29+
Recall the stages of the Bayesian workflow.
30+
31+
:::{admonition} The Bayesian Workflow Stages
32+
:class: tip
2933

3034
- **Conceptual model building**: Translate domain knowledge into statistical assumptions
3135
- **Prior predictive simulation**: Check if priors generate reasonable data
@@ -35,6 +39,9 @@ A secondary motivation is to put SEM modelling with PyMC on firmer ground by det
3539
- **Model comparison**: Compare alternative models systematically
3640
- **Model expansion or simplification**: Iterate based on findings
3741
- **Decision analysis**: Use the model for predictions or decisions
42+
:::
43+
44+
The Structural equation modelling workflow is similar.
3845

3946
### The SEM Workflow
4047
- __Confirm the Factor Structure__ (CFA):
@@ -244,6 +251,10 @@ sample_df.head().style.set_properties(
244251
)
245252
```
246253

254+
Conveniently, the process of the Bayesian workflow itself involves the constructive thought strategies. At each juncture in model development we must ask ourselves: do i believe this? What assumptions have I made? Is there any visual evidence that my model is well specified? What can i do to improve the model specification? So we might hope that the end result of the Bayesian workflow is a general sense of satisfaction with a job well done!
255+
256+
+++
257+
247258
## Mathematical Interlude
248259

249260
In the general set up of a Structural Equation Model 3e have observed variables $y \in R^{p}$, here (p=12) and $\eta \in R^{m}$ latent factors. The SEM consists of two parts the measurement model and the structural regressions. The Measurement Model - this is the factor structure we seek to _confirm_. In this kind of factor analysis we posit a factor structure of how each factor determines the observed metrics.
@@ -309,7 +320,13 @@ We'll introduce each of these components are additional steps as we layer over t
309320

310321
+++
311322

312-
## Confirmatory Factor Analysis
323+
## Setting up Utility Functions
324+
325+
For this exercise we will lean on a range of utility functions to build and compare the expansionary sequence. This functions include repeated steps that will be required for any SEM model.
326+
327+
The most important cases are functions like `make_lambda` to sample and fix the scale of the covariates that contribute to each latent factor. Similarly, we have the `make_B` which samples the parameter values of the path coefficients between the latent constructs, while arranging them in a matrix that can be passed through matrix multiplication routines. Additionally, we have a `make_Psi` function which samples parameter values for particular covariances that gets deployed to encode aspects of the variance in our system not captured by the covariances among the latent factors. These three helper functions determine the structure of the SEM model and variants of each can be used to construct any SEM structure.
328+
329+
We also save some plotting functions which will be used to compare models.
313330

314331
```{code-cell} ipython3
315332
:tags: [hide-input]
@@ -465,7 +482,13 @@ def sample_model(model, sampler_kwargs):
465482
return idata
466483
```
467484

468-
## CFA v1
485+
## Confirming Factor Structure
486+
487+
First we'll highlight the broad structure of a confirmatory factor model and the types of relations the model encodes. The red dotted arrows here denote covariance relationships among the latent factors. The black arrows denote the effect of the latent constructs on the observable indicator metrics. We've highlighted with red [1] that the first "factor loading" is always fixed to (a) define the scale of the factor and (b) allow identification of the other factor loadings within that factor.
488+
489+
![](cfa_excalidraw.png)
490+
491+
In the model below we sample draws from the latent factors `eta` and relate them to the observables by the matrix computation `pt.dot(eta, Lambda.T)`. This computation reults in a "psuedo-observation" matrix which we then feed through our likelihood to calibrate the latent structures against the observed dats. This is the general pattern we'll see in all models below.
469492

470493
```{code-cell} ipython3
471494
with pm.Model(coords=coords) as cfa_model_v1:
@@ -497,6 +520,8 @@ with pm.Model(coords=coords) as cfa_model_v1:
497520
pm.model_to_graphviz(cfa_model_v1)
498521
```
499522

523+
The model diagram should emphasise how the sampling of the latent structure is fed-forward into the ultimate likelihood term. Note here how our likelihood term is specified as a independent Normals. This is a substantive assumption which is later revised. In a full SEM specification we will change the likelihood to use Multivariate normal distribution with specific covariance structures.
524+
500525
```{code-cell} ipython3
501526
idata_cfa_model_v1 = sample_model(cfa_model_v1, sampler_kwargs=sampler_kwargs)
502527
```
@@ -511,6 +536,8 @@ idata_cfa_model_v1["posterior"]["Lambda"].sel(chain=0, draw=0)
511536

512537
### Model Diagnostics and Assessment
513538

539+
For each latent variable (satisfaction, well being, constructive, dysfunctional), we will plot a forest/ridge plot of the posterior distributions of their factor scores `eta` as drawn. Each panel will have a vertical reference line at 0 (since latent scores are typically centered/scaled).These panels visualize the distribution of estimated latent scores across individuals, separated by latent factor. Then we will summarizes posterior estimates of model parameters (factor loadings, regression coefficients, variances, etc.), providing a quick check against identification constraints (like fixed loadings) and effect directions. Finally we will plot the upper-triangle of the residual correlation matrix with a blue–white–red colormap (−1 to +1). This visualizes residual correlations among observed indicators after the SEM structure is accounted for — helping detect model misfit or unexplained associations.
540+
514541
```{code-cell} ipython3
515542
:tags: [hide-input]
516543
@@ -620,7 +647,9 @@ plot_model_highlights(idata_cfa_model_v1, "CFA", parameters)
620647
plot_diagnostics(idata_cfa_model_v1, parameters);
621648
```
622649

623-
## SEM V1 Conditional Formulation
650+
## Structuring the Latent Relations
651+
652+
![](sem3_excalidraw.png)
624653

625654
```{code-cell} ipython3
626655
with pm.Model(coords=coords) as sem_model_v1:
162 KB
Loading
180 KB
Loading

0 commit comments

Comments
 (0)