Bayesian SEM

Setup

Initial install:

$ conda env create -f environment.yml
$ source activate bayes-sem

Deactivate Environment:

$ conda deactivate

Remove:

$ conda remove -n bayes-sem --all

Convert a.ipynb to a.py:

$ jupyter nbconvert --to script a.ipynb

Note this repo requires pystan version 2.19.* (see here)

How to use:

Stan models

The stan model files are located in src/codebase/stan_code/cont for continuous and src/codebase/stan_code/disc for the discrete models respectively.

Continuous Data

Simulation Continuous Data

To run a 2-factor and 6 manifest variables model simulation run run_sim_cont.py. The command runs as follows
```
run_sim_cont.py <num_warmup> <num_samples> <data_scenario> <model_code>
```
with optional flags for -th (--task_handle), -pm (--print_model),-num_chains (--num_chains), -seed (--random_seed), -nd (--nsim_data) and -xdir (--existing_directory). The results are saved in src/log/<date-time>_<task_handle>.

To choose which model to run use the <model_code> option as follows: "0:benchmark saturated model,1 CFA/4 EFA:exact zeros no u's, 2 CFA/5 EFA: full factor model"

If an existing directory is given then the script looks for an existing compiled stan model to load and run with the new number of iterations.

The results are processed using compute_ppp_cont.py
To run a 2-factor and 6 manifest variables model simulation with kfold splits run the same as above with the cv flag on run_sim_cont.py -cv cv, and additional similar options as above.

The results are processed using compute_cv_cont.py

For versions of the script that run the experiment multiple times with different seeds look for the _bstr suffix.

Real World Application

We run the factor model on the Big 5 personality data. The data is examined in 2.0 Muthen Data.ipynb.

To run the Big 5 experiment run run_big5.py.
```
run_big5.py <num_warmup> <num_samples> <model_code>
```
with optional flags for -th (--task_handle), -pm (--print_model), (--num_chains), (--gender) and -xdir (--existing_directory). The results are saved in src/log/<date-time>_<task_handle>.

The results are processed using compute_ppp_cont.py

To run the cross validation version of this exdperiment with kfold splits run the same as above with the cv flag on run_big5.py -cv cv, and additional similar options as above.

The results are processed using compute_cv_cont.py

Discrete Data

Simulation Binary Data

Run the simulation experiment with run_sim_bin.py. To choose which model to run use the <model_code> option as follows: "1: Model1 (2 factor), 2:Model 2 (2 factor), 3: Model 1.5 (2 factor), 4: 1 Factor Model, 5: EFA no u's, 6: EFA with u's". Results are saved in log/<time_task_handle> directories. By default it runs the 'PPP' experiment, pass -cv cv to run the Cross Validation experiment.
Compute PPP values with src/compute_ppp_bin.py and cross validation values with src/compute_cv_bin.py
Posterior Sample charts in 3.1-Binary-Simulation-Plotting-CFA.ipynb and 3.2-Binary-Simulation-Plotting-EFA.ipynb

Real World binary data

FND Data

We used our methodology on the on the real dataset FND (Fagerstrom Nicotine Dependence). There are 6 questions (J=6) that are binary (either originally or made to be). We used two different factor structures: 1 factor (K=1) loading on all items, and 2 factors (K=2) where the first factor loads to questions 1,2,3 and the second factor loads to 4,5,6.

The model structure and related literature of the dataset and other factor analyses papers written on the dataset is described in "A confirmatory factor analysis of the Fagerstrom Test for Nicotine Dependence" by Chris G. Richardson, Pamela A. Ratner.

Raw data is saved in dat/real_data/FND.csv and the data descriptions are in dat/real_data/FTND Description.xlsx. Data is processed with src/codebase/data_FND.py. There is also a notebook presenting some descriptive statistics of the dataset src/10. Real Data Setup - FND.ipynb
Run the FND data experiment with run_FND_bin.py. To choose which model to run use the <model_code> option as follows: "1: CFA (2 factor no u's), 2: CFA 2 (2 factor), 3: Model 1.5 (2 factor), 4: EFA Model no u's, 5: EFA Model". If running an EFA model (model 4 or 5) you can choose the number of factors k with flag -nfac k. Results are saved in log/<time_task_handle> directories. By default it runs the 'PPP' experiment, pass -cv cv to run the Cross Validation experiment.
Compute PPP values with src/compute_ppp_bin.py and cross validation values with src/compute_cv_bin.py
Posterior Sample charts in 4.1-FDN-Plotting-CFA.ipynb and 4.2-FDN-Plotting-EFA.ipynb

On Fabian

We installed pystan version 2.19 which should run the same scripts as local correctly.

Remove apps/anaconda3/5.0.0
Activate apps/anaconda3/4.4.0

Activate env pystan-dev

module del apps/anaconda3/5.0.0
module add apps/anaconda3/5.0.0
source activate pystan-dev

On local system

To save storage space you can remove old pickled files with

find . type f -name '*.p' -delete

Repeated Simulations

We ran repeated simulation experiments to look at the properties of our model selection method. We ran 100 experiments each time generating a new dataset with the script run_sim_cont_multiple_runs.py and collect log score results with script compute_multiple_logscores.py. Notebook to present results Check multiple runs.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 703 Commits
dat		dat
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bayesian SEM

Setup

How to use:

Stan models

Continuous Data

Simulation Continuous Data

Real World Application

Discrete Data

Simulation Binary Data

Real World binary data

FND Data

On Fabian

On local system

Repeated Simulations

About

Uh oh!

Releases

Packages

Languages

bayesways/bayes-sem

Folders and files

Latest commit

History

Repository files navigation

Bayesian SEM

Setup

How to use:

Stan models

Continuous Data

Simulation Continuous Data

Real World Application

Discrete Data

Simulation Binary Data

Real World binary data

FND Data

On Fabian

On local system

Repeated Simulations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages