Initial install:
$ conda env create -f environment.yml
$ source activate bayes-sem
Deactivate Environment:
$ conda deactivate
Remove:
$ conda remove -n bayes-sem --all
Convert a.ipynb to a.py:
$ jupyter nbconvert --to script a.ipynb
Note this repo requires pystan version 2.19.* (see here)
The stan model files are located in src/codebase/stan_code/cont for continuous and src/codebase/stan_code/disc for the discrete models respectively.
-
To run a 2-factor and 6 manifest variables model simulation run
run_sim_cont.py. The command runs as followsrun_sim_cont.py <num_warmup> <num_samples> <data_scenario> <model_code>with optional flags for
-th(--task_handle),-pm(--print_model),-num_chains(--num_chains),-seed(--random_seed),-nd(--nsim_data) and-xdir(--existing_directory). The results are saved insrc/log/<date-time>_<task_handle>.To choose which model to run use the
<model_code>option as follows: "0:benchmark saturated model,1 CFA/4 EFA:exact zeros no u's, 2 CFA/5 EFA: full factor model"If an existing directory is given then the script looks for an existing compiled stan model to load and run with the new number of iterations.
The results are processed using
compute_ppp_cont.py -
To run a 2-factor and 6 manifest variables model simulation with kfold splits run the same as above with the cv flag on
run_sim_cont.py -cv cv, and additional similar options as above.
The results are processed using compute_cv_cont.py
- For versions of the script that run the experiment multiple times with different seeds look for the
_bstrsuffix.
We run the factor model on the Big 5 personality data. The data is examined in 2.0 Muthen Data.ipynb.
-
To run the Big 5 experiment run
run_big5.py.run_big5.py <num_warmup> <num_samples> <model_code>with optional flags for
-th(--task_handle),-pm(--print_model), (--num_chains), (--gender) and-xdir(--existing_directory). The results are saved insrc/log/<date-time>_<task_handle>.
The results are processed using compute_ppp_cont.py
- To run the cross validation version of this exdperiment with kfold splits run the same as above with the cv flag on
run_big5.py -cv cv, and additional similar options as above.
The results are processed using compute_cv_cont.py
-
Run the simulation experiment with
run_sim_bin.py. To choose which model to run use the<model_code>option as follows: "1: Model1 (2 factor), 2:Model 2 (2 factor), 3: Model 1.5 (2 factor), 4: 1 Factor Model, 5: EFA no u's, 6: EFA with u's". Results are saved inlog/<time_task_handle>directories. By default it runs the 'PPP' experiment, pass-cv cvto run the Cross Validation experiment. -
Compute PPP values with
src/compute_ppp_bin.pyand cross validation values withsrc/compute_cv_bin.py -
Posterior Sample charts in
3.1-Binary-Simulation-Plotting-CFA.ipynband3.2-Binary-Simulation-Plotting-EFA.ipynb
We used our methodology on the on the real dataset FND (Fagerstrom Nicotine Dependence). There are 6 questions (J=6) that are binary (either originally or made to be). We used two different factor structures: 1 factor (K=1) loading on all items, and 2 factors (K=2) where the first factor loads to questions 1,2,3 and the second factor loads to 4,5,6.
The model structure and related literature of the dataset and other factor analyses papers written on the dataset is described in "A confirmatory factor analysis of the Fagerstrom Test for Nicotine Dependence" by Chris G. Richardson, Pamela A. Ratner.
-
Raw data is saved in
dat/real_data/FND.csvand the data descriptions are indat/real_data/FTND Description.xlsx. Data is processed withsrc/codebase/data_FND.py. There is also a notebook presenting some descriptive statistics of the datasetsrc/10. Real Data Setup - FND.ipynb -
Run the FND data experiment with
run_FND_bin.py. To choose which model to run use the<model_code>option as follows: "1: CFA (2 factor no u's), 2: CFA 2 (2 factor), 3: Model 1.5 (2 factor), 4: EFA Model no u's, 5: EFA Model". If running an EFA model (model 4 or 5) you can choose the number of factors k with flag-nfac k. Results are saved inlog/<time_task_handle>directories. By default it runs the 'PPP' experiment, pass-cv cvto run the Cross Validation experiment. -
Compute PPP values with
src/compute_ppp_bin.pyand cross validation values withsrc/compute_cv_bin.py -
Posterior Sample charts in
4.1-FDN-Plotting-CFA.ipynband4.2-FDN-Plotting-EFA.ipynb
We installed pystan version 2.19 which should run the same scripts as local correctly.
-
Remove
apps/anaconda3/5.0.0 -
Activate
apps/anaconda3/4.4.0 -
Activate env
pystan-devmodule del apps/anaconda3/5.0.0 module add apps/anaconda3/5.0.0 source activate pystan-dev
To save storage space you can remove old pickled files with
find . type f -name '*.p' -delete
We ran repeated simulation experiments to look at the properties of our model selection method.
We ran 100 experiments each time generating a new dataset with the script run_sim_cont_multiple_runs.py and collect log score results with script compute_multiple_logscores.py. Notebook to present results Check multiple runs.ipynb.