Skip to content

Using clustering algorithms to explore patterns in Machiavellianism (MACH-IV) personality data.

License

Notifications You must be signed in to change notification settings

jeffreyHoelzel/mach-iv-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MACH-IV Clustering

Using clustering algorithms to explore patterns in Machiavellianism (MACH-IV) personality data.

Summary

This project applies four different clustering algorithms to the MACH-IV dataset (a measure of Machiavellianism personality traits) in order to explore latent grouping patterns in the data. The repository contains code for preprocessing, applying clustering (K‑Means clustering, Gaussian Mixture Model, Spectral Clustering, and Hierarchical Agglomerative Clustering), and visualising the results. The goal is to identify whether distinct clusters of respondents emerge on the MACH-IV scale and to characterise those clusters in terms of personality traits and other variables.

Architecture

├── cluster_analysis/          # high-level analysis code
├── data/                      # raw and processed datasets
├── docs/                      # all code for the website
├── kmeans/                    # code for k-means clustering
├── gmm/                       # code for GMM clustering
├── hierarchical/              # code for hierarchical clustering
├── spectral/                  # code for spectral clustering
├── requirements.txt           # Python dependencies
├── LICENSE                    # we use the MIT license
└── README.md                  # you are here!

Dataset

  • The primary dataset used is a subset of the MACH-IV scale containing 20 likert-style questions where 5=strongly agree and 1=strongly disagree, obtained from the Open-Source Psychometrics Project
  • The data directory includes the raw responses and various demographic or auxillary variables (i.e., age, gender, location, etc.).
  • We specifically identified relationships between clustered Machiavellianism levels and the personality traits based off of the Ten Item Personality Inventory (TIPI).
  • We store the pre-processed data in a CSV file located at data/MACH_data/data.cleaned.csv.

Getting Started

Local Usage

  1. Clone this repository using either SSH or HTTPS, or download this codebase as a .zip file extract everything locally.
  2. Ensure you are in the root directory:
cd path/to/mach-iv-clustering
  1. Create a Python virtual environment to install all the required dependencies in:
python -m venv venv              # creates a virutal environment named 'venv'
  1. Activiate your Python virtual environment:
./venv/Scripts/activate          # for Windows Powershell
source venv/bin/activate         # for Mac or Linux
  1. Install the dependencies:
pip install -r requirements.txt  # installs all dependencies in the requirements.txt file
  1. Configuration
  • Modify the global variables used throughout each algorithm at <algorithm>/setup/config.py (i.e., features, path to data, etc.).
  • Modify any arguments necessary in each main script at <algorithm>/run_<algorithm>.py.
  1. Simply run (a time-stamped artifacts folder will be generated in your current directory containing the program output):
python <algorithm>/run_<algorithm>.py
  1. To run cluster analysis (a time-stamped artifacts folder will be generated in your current directory containing the program output):
 python cluster_analysis/analyze_clusters.py -i <algorithm>\<artifacts_folder>\data\<cluster_labels>.csv

HPC Usage

  1. Ensure you have access to an HPC. For this guide, we are assuming you are an NAU student with access to the Monsoon HPC. We are also assuming you have some basic understanding of the Linux command line and Monsoon.
  2. Clone this repository using either SSH or HTTPS, or download this codebase as a .zip file extract everything locally.
  3. SCP the dataset to your scratch directory:
scp path/to/mach-iv-clustering/MACH_data/data.cleaned.csv \
<NAU_ID>@monsoon.hpc.nau.edu:/scratch/<NAU_ID>      # ex. NAU ID: abc123
  1. SCP the Bash shell script, .pyz files corresponding for the algorithm you want to run, and the environment.yml file to your home directory:
scp path/to/mach-iv-clustering/<algorithm>/hpc/run_<algorithm>_clustering.pyz \
path/to/mach-iv-clustering/<algorithm>/hpc/run_<algorithm>.sh \
path/to/mach-iv-clustering/environment.yml \
<NAU_ID>@monsoon.hpc.nau.edu:~/
  1. Set up a Conda virtual environment your job can activate (update .sh script with environment name from .yml file):
module load anaconda3
conda env create -f environment.yml # creates a Conda environment named 'mach-iv-clustering'
  1. Submit a new job (script takes care of dependency installation):
sbatch run_<algorithm>.sh
  1. You can view the status of your job using the commands below. A time-stamped artifacts folder will be generated in your home directory containing the program output.
squeue --job <job_id>            # for GPU queue
jobstats -r                      # for any running jobs
  1. Once the job is complete, you can view the results in your scratch/<NAU_ID>/ directory and SCP the results back to your machine.
cd /scratch/$USER

About

Using clustering algorithms to explore patterns in Machiavellianism (MACH-IV) personality data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •