Skip to content

m-ahmad-butt/MBTI-Co.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MBTI Co. Timeline Predictor

MBTI Co. is an analytical tool designed to predict Myers-Briggs Type Indicator (MBTI) profiles from conversational data over time. It specifically processes WhatsApp chat exports, allowing users to select individual participants and visualize their personality evolution through a temporal timeline synced with their message history.
Hugging Face Model: mbti-co-model

Demo Screenshots

Home Page User Page Timeline Page Messages

Performance Overview

The current model (v2) implements message aggregation and advanced loss functions to achieve high precision in personality classification.

Aggregate Metrics

Metric Score
Accuracy 82.8%
Precision 82.7%
Recall 82.8%
F1 Score 82.4%

Per-Class Performance

Type Precision Recall F1-Score
ENFJ 0.963 1.000 0.981
INFJ 0.987 0.974 0.981
INTJ 0.974 0.962 0.968
ENTP 0.886 0.933 0.909
ENFP 0.929 0.878 0.903
INTP 0.905 0.893 0.899
ISFP 0.798 0.986 0.882
ISTP 0.825 0.930 0.874
ESFJ 0.831 0.885 0.857
ISFJ 0.823 0.708 0.761
ESFP 0.705 0.796 0.748
INFP 0.662 0.818 0.732
ISTJ 0.729 0.683 0.705
ESTJ 0.712 0.675 0.693
ENTJ 0.672 0.558 0.610
ESTP 0.714 0.507 0.593

Evaluation Visualizations

Overall Metrics F1 Score Per Class Confusion Matrix Metrics Comparison ROC Curves

Technical Implementation

Core Architecture

  • Base Model: XLM-RoBERTa (Multilingual Transformer)
  • Framework: PyTorch and Hugging Face Transformers
  • Loss Function: Focal Loss with label smoothing (optimized for class imbalance)
  • Training Strategy: Message aggregation (8 messages per sample) with [SEP] delimiters
  • Optimization: AdamW with Cosine Learning Rate Decay and gradient clipping

Documentation

Methodology Improvements

The transition from v1 to v2 resulted in an accuracy increase from 47.7% to 82.8% by implementing:

  • Message Aggregation: Context-aware samples providing richer linguistic signals.
  • Extended Context: Increased sequence length to 256 tokens.
  • Noisy Label Mitigation: Implementation of label smoothing to handle LLM-generated training data.
  • Standardized Cleaning: Unified preprocessing pipeline for training and inference.

Operational Workflow

1. Environment Configuration

Install required dependencies and configure the environment:

pip install -r requirements.txt

Create a .env file in the project root to store the API credentials:

GEMINI_API_KEY_1=your_api_key_here

2. Data Preparation Pipeline

Before training, raw WhatsApp data must be translated, labeled, cleaned, and balanced.

Translation and Labeling

Generate a labeled dataset using the Gemini API:

python -m model.data_prep.translate_msgs

This processes data/chatBoys.txt and generates data/translated_chat.csv.

Cleaning and Preprocessing

Clean the generated CSV and normalize text:

python -m model.data_prep.clean_csv

Dataset Balancing

Ensure even distribution across MBTI classes to prevent model bias:

python -m model.data_prep.balance_data

3. Model Training

Execute the main training script. This script handles message aggregation (8 per sample), 70/30 train/test splitting, and model fine-tuning:

python -m model.training.train

The trained model weights and tokenizer will be saved to model/output/mbti_model/.

4. Model Evaluation

Generate comprehensive performance metrics and visualizations on the held-out test set:

python -m model.evaluation.evaluate

Visual results (Confusion Matrix, ROC Curves) are saved to model/evaluation/results/.

5. Local Inference and Testing

Test model performance on specific inputs or full chat files from the command line.

Single Batch Prediction

Test a specific chunk of messages (separated by [SEP]) to see top-5 predictions:

python -m model.inference.predict

Full Chat Timeline Analysis

Run inference on a complete WhatsApp .txt export to generate a personality timeline:

python -m model.inference.infer

6. Running the Full Application

Deploy the complete MBTI Co. with a visual interface.

Backend Service

Start the Flask API to handle requests and manage inference state:

cd backend
python app.py

The backend initializes the model and serves endpoints at http://localhost:5000.

Frontend Interface

Start the React development server to visualize timelines and message contexts:

cd frontend
npm install
npm run dev

Access the dashboard at http://localhost:5173.

Key Features

  • Dynamic Timeline Generation: 2-month interval analysis of personality shifts.
  • Multilingual Support: Optimized for Roman Urdu and English code-switching.
  • Sentiment Filtering: Automatic removal of neutral or uninformative messages to reduce noise.
  • Detailed Metrics: Comprehensive evaluation suite including Confusion Matrices and ROC curves.

Academic Context

Developed as part of a Artificial Intelligence curriculum. Character assets and theoretical frameworks are used for educational research purposes.

About

MBTI personality predictor from whatsapp chat based on 2 months interval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors