Using clustering algorithms to explore patterns in Machiavellianism (MACH-IV) personality data.
This project applies four different clustering algorithms to the MACH-IV dataset (a measure of Machiavellianism personality traits) in order to explore latent grouping patterns in the data. The repository contains code for preprocessing, applying clustering (K‑Means clustering, Gaussian Mixture Model, Spectral Clustering, and Hierarchical Agglomerative Clustering), and visualising the results. The goal is to identify whether distinct clusters of respondents emerge on the MACH-IV scale and to characterise those clusters in terms of personality traits and other variables.
├── cluster_analysis/ # high-level analysis code
├── data/ # raw and processed datasets
├── docs/ # all code for the website
├── kmeans/ # code for k-means clustering
├── gmm/ # code for GMM clustering
├── hierarchical/ # code for hierarchical clustering
├── spectral/ # code for spectral clustering
├── requirements.txt # Python dependencies
├── LICENSE # we use the MIT license
└── README.md # you are here!
- The primary dataset used is a subset of the MACH-IV scale containing 20 likert-style questions where 5=strongly agree and 1=strongly disagree, obtained from the
Open-Source Psychometrics Project - The data directory includes the raw responses and various demographic or auxillary variables (i.e., age, gender, location, etc.).
- We specifically identified relationships between clustered Machiavellianism levels and the personality traits based off of the Ten Item Personality Inventory (TIPI).
- We store the pre-processed data in a CSV file located at
data/MACH_data/data.cleaned.csv.
- Clone this repository using either SSH or HTTPS, or download this codebase as a .zip file extract everything locally.
- Ensure you are in the root directory:
cd path/to/mach-iv-clustering- Create a Python virtual environment to install all the required dependencies in:
python -m venv venv # creates a virutal environment named 'venv'- Activiate your Python virtual environment:
./venv/Scripts/activate # for Windows Powershell
source venv/bin/activate # for Mac or Linux- Install the dependencies:
pip install -r requirements.txt # installs all dependencies in the requirements.txt file- Configuration
- Modify the global variables used throughout each algorithm at
<algorithm>/setup/config.py(i.e., features, path to data, etc.). - Modify any arguments necessary in each main script at
<algorithm>/run_<algorithm>.py.
- Simply run (a time-stamped artifacts folder will be generated in your current directory containing the program output):
python <algorithm>/run_<algorithm>.py- To run cluster analysis (a time-stamped artifacts folder will be generated in your current directory containing the program output):
python cluster_analysis/analyze_clusters.py -i <algorithm>\<artifacts_folder>\data\<cluster_labels>.csv- Ensure you have access to an HPC. For this guide, we are assuming you are an NAU student with access to the Monsoon HPC. We are also assuming you have some basic understanding of the Linux command line and Monsoon.
- Clone this repository using either SSH or HTTPS, or download this codebase as a .zip file extract everything locally.
- SCP the dataset to your scratch directory:
scp path/to/mach-iv-clustering/MACH_data/data.cleaned.csv \
<NAU_ID>@monsoon.hpc.nau.edu:/scratch/<NAU_ID> # ex. NAU ID: abc123- SCP the Bash shell script, .pyz files corresponding for the algorithm you want to run, and the
environment.ymlfile to your home directory:
scp path/to/mach-iv-clustering/<algorithm>/hpc/run_<algorithm>_clustering.pyz \
path/to/mach-iv-clustering/<algorithm>/hpc/run_<algorithm>.sh \
path/to/mach-iv-clustering/environment.yml \
<NAU_ID>@monsoon.hpc.nau.edu:~/- Set up a Conda virtual environment your job can activate (update .sh script with environment name from .yml file):
module load anaconda3
conda env create -f environment.yml # creates a Conda environment named 'mach-iv-clustering'- Submit a new job (script takes care of dependency installation):
sbatch run_<algorithm>.sh- You can view the status of your job using the commands below. A time-stamped artifacts folder will be generated in your home directory containing the program output.
squeue --job <job_id> # for GPU queue
jobstats -r # for any running jobs- Once the job is complete, you can view the results in your
scratch/<NAU_ID>/directory and SCP the results back to your machine.
cd /scratch/$USER