Skip to content

bayesways/bayes-sem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian SEM

Setup

Initial install:

$ conda env create -f environment.yml
$ source activate bayes-sem

Deactivate Environment:

$ conda deactivate

Remove:

$ conda remove -n bayes-sem --all

Convert a.ipynb to a.py:

$ jupyter nbconvert --to script a.ipynb

Note this repo requires pystan version 2.19.* (see here)

How to use:

Stan models

The stan model files are located in src/codebase/stan_code/cont for continuous and src/codebase/stan_code/disc for the discrete models respectively.

Continuous Data

Simulation Continuous Data

  • To run a 2-factor and 6 manifest variables model simulation run run_sim_cont.py. The command runs as follows

    run_sim_cont.py <num_warmup> <num_samples> <data_scenario> <model_code>
    

    with optional flags for -th (--task_handle), -pm (--print_model),-num_chains (--num_chains), -seed (--random_seed), -nd (--nsim_data) and -xdir (--existing_directory). The results are saved in src/log/<date-time>_<task_handle>.

    To choose which model to run use the <model_code> option as follows: "0:benchmark saturated model,1 CFA/4 EFA:exact zeros no u's, 2 CFA/5 EFA: full factor model"

    If an existing directory is given then the script looks for an existing compiled stan model to load and run with the new number of iterations.

    The results are processed using compute_ppp_cont.py

  • To run a 2-factor and 6 manifest variables model simulation with kfold splits run the same as above with the cv flag on run_sim_cont.py -cv cv, and additional similar options as above.

The results are processed using compute_cv_cont.py

  • For versions of the script that run the experiment multiple times with different seeds look for the _bstr suffix.

Real World Application

We run the factor model on the Big 5 personality data. The data is examined in 2.0 Muthen Data.ipynb.

  • To run the Big 5 experiment run run_big5.py.

    run_big5.py <num_warmup> <num_samples> <model_code>
    

    with optional flags for -th (--task_handle), -pm (--print_model), (--num_chains), (--gender) and -xdir (--existing_directory). The results are saved in src/log/<date-time>_<task_handle>.

The results are processed using compute_ppp_cont.py

  • To run the cross validation version of this exdperiment with kfold splits run the same as above with the cv flag on run_big5.py -cv cv, and additional similar options as above.

The results are processed using compute_cv_cont.py

Discrete Data

Simulation Binary Data

  • Run the simulation experiment with run_sim_bin.py. To choose which model to run use the <model_code> option as follows: "1: Model1 (2 factor), 2:Model 2 (2 factor), 3: Model 1.5 (2 factor), 4: 1 Factor Model, 5: EFA no u's, 6: EFA with u's". Results are saved in log/<time_task_handle> directories. By default it runs the 'PPP' experiment, pass -cv cv to run the Cross Validation experiment.

  • Compute PPP values with src/compute_ppp_bin.py and cross validation values with src/compute_cv_bin.py

  • Posterior Sample charts in 3.1-Binary-Simulation-Plotting-CFA.ipynb and 3.2-Binary-Simulation-Plotting-EFA.ipynb

Real World binary data

FND Data

We used our methodology on the on the real dataset FND (Fagerstrom Nicotine Dependence). There are 6 questions (J=6) that are binary (either originally or made to be). We used two different factor structures: 1 factor (K=1) loading on all items, and 2 factors (K=2) where the first factor loads to questions 1,2,3 and the second factor loads to 4,5,6.

The model structure and related literature of the dataset and other factor analyses papers written on the dataset is described in "A confirmatory factor analysis of the Fagerstrom Test for Nicotine Dependence" by Chris G. Richardson, Pamela A. Ratner.

  • Raw data is saved in dat/real_data/FND.csv and the data descriptions are in dat/real_data/FTND Description.xlsx. Data is processed with src/codebase/data_FND.py. There is also a notebook presenting some descriptive statistics of the dataset src/10. Real Data Setup - FND.ipynb

  • Run the FND data experiment with run_FND_bin.py. To choose which model to run use the <model_code> option as follows: "1: CFA (2 factor no u's), 2: CFA 2 (2 factor), 3: Model 1.5 (2 factor), 4: EFA Model no u's, 5: EFA Model". If running an EFA model (model 4 or 5) you can choose the number of factors k with flag -nfac k. Results are saved in log/<time_task_handle> directories. By default it runs the 'PPP' experiment, pass -cv cv to run the Cross Validation experiment.

  • Compute PPP values with src/compute_ppp_bin.py and cross validation values with src/compute_cv_bin.py

  • Posterior Sample charts in 4.1-FDN-Plotting-CFA.ipynb and 4.2-FDN-Plotting-EFA.ipynb

On Fabian

We installed pystan version 2.19 which should run the same scripts as local correctly.

  1. Remove apps/anaconda3/5.0.0

  2. Activate apps/anaconda3/4.4.0

  3. Activate env pystan-dev

    module del apps/anaconda3/5.0.0
    module add apps/anaconda3/5.0.0
    source activate pystan-dev
    

On local system

To save storage space you can remove old pickled files with

find . type f -name '*.p' -delete

Repeated Simulations

We ran repeated simulation experiments to look at the properties of our model selection method. We ran 100 experiments each time generating a new dataset with the script run_sim_cont_multiple_runs.py and collect log score results with script compute_multiple_logscores.py. Notebook to present results Check multiple runs.ipynb.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages