MACH-IV Clustering

Using clustering algorithms to explore patterns in Machiavellianism (MACH-IV) personality data.

Summary

This project applies four different clustering algorithms to the MACH-IV dataset (a measure of Machiavellianism personality traits) in order to explore latent grouping patterns in the data. The repository contains code for preprocessing, applying clustering (K‑Means clustering, Gaussian Mixture Model, Spectral Clustering, and Hierarchical Agglomerative Clustering), and visualising the results. The goal is to identify whether distinct clusters of respondents emerge on the MACH-IV scale and to characterise those clusters in terms of personality traits and other variables.

Architecture

├── cluster_analysis/          # high-level analysis code
├── data/                      # raw and processed datasets
├── docs/                      # all code for the website
├── kmeans/                    # code for k-means clustering
├── gmm/                       # code for GMM clustering
├── hierarchical/              # code for hierarchical clustering
├── spectral/                  # code for spectral clustering
├── requirements.txt           # Python dependencies
├── LICENSE                    # we use the MIT license
└── README.md                  # you are here!

Dataset

The primary dataset used is a subset of the MACH-IV scale containing 20 likert-style questions where 5=strongly agree and 1=strongly disagree, obtained from the Open-Source Psychometrics Project
The data directory includes the raw responses and various demographic or auxillary variables (i.e., age, gender, location, etc.).
We specifically identified relationships between clustered Machiavellianism levels and the personality traits based off of the Ten Item Personality Inventory (TIPI).
We store the pre-processed data in a CSV file located at data/MACH_data/data.cleaned.csv.

Getting Started

Local Usage

Clone this repository using either SSH or HTTPS, or download this codebase as a .zip file extract everything locally.
Ensure you are in the root directory:

cd path/to/mach-iv-clustering

Create a Python virtual environment to install all the required dependencies in:

python -m venv venv              # creates a virutal environment named 'venv'

Activiate your Python virtual environment:

./venv/Scripts/activate          # for Windows Powershell
source venv/bin/activate         # for Mac or Linux

Install the dependencies:

pip install -r requirements.txt  # installs all dependencies in the requirements.txt file

Configuration

Modify the global variables used throughout each algorithm at <algorithm>/setup/config.py (i.e., features, path to data, etc.).
Modify any arguments necessary in each main script at <algorithm>/run_<algorithm>.py.

Simply run (a time-stamped artifacts folder will be generated in your current directory containing the program output):

python <algorithm>/run_<algorithm>.py

To run cluster analysis (a time-stamped artifacts folder will be generated in your current directory containing the program output):

 python cluster_analysis/analyze_clusters.py -i <algorithm>\<artifacts_folder>\data\<cluster_labels>.csv

HPC Usage

Ensure you have access to an HPC. For this guide, we are assuming you are an NAU student with access to the Monsoon HPC. We are also assuming you have some basic understanding of the Linux command line and Monsoon.
Clone this repository using either SSH or HTTPS, or download this codebase as a .zip file extract everything locally.
SCP the dataset to your scratch directory:

scp path/to/mach-iv-clustering/MACH_data/data.cleaned.csv \
<NAU_ID>@monsoon.hpc.nau.edu:/scratch/<NAU_ID>      # ex. NAU ID: abc123

SCP the Bash shell script, .pyz files corresponding for the algorithm you want to run, and the environment.yml file to your home directory:

scp path/to/mach-iv-clustering/<algorithm>/hpc/run_<algorithm>_clustering.pyz \
path/to/mach-iv-clustering/<algorithm>/hpc/run_<algorithm>.sh \
path/to/mach-iv-clustering/environment.yml \
<NAU_ID>@monsoon.hpc.nau.edu:~/

Set up a Conda virtual environment your job can activate (update .sh script with environment name from .yml file):

module load anaconda3
conda env create -f environment.yml # creates a Conda environment named 'mach-iv-clustering'

Submit a new job (script takes care of dependency installation):

sbatch run_<algorithm>.sh

You can view the status of your job using the commands below. A time-stamped artifacts folder will be generated in your home directory containing the program output.

squeue --job <job_id>            # for GPU queue
jobstats -r                      # for any running jobs

Once the job is complete, you can view the results in your scratch/<NAU_ID>/ directory and SCP the results back to your machine.

cd /scratch/$USER

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MACH-IV Clustering

Summary

Architecture

Dataset

Getting Started

Local Usage

HPC Usage

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
cluster_analysis		cluster_analysis
data		data
docs		docs
gmm		gmm
hierarchical		hierarchical
kmeans		kmeans
spectral		spectral
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

jeffreyHoelzel/mach-iv-clustering

Folders and files

Latest commit

History

Repository files navigation

MACH-IV Clustering

Summary

Architecture

Dataset

Getting Started

Local Usage

HPC Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages