MBTI Co. Timeline Predictor

MBTI Co. is an analytical tool designed to predict Myers-Briggs Type Indicator (MBTI) profiles from conversational data over time. It specifically processes WhatsApp chat exports, allowing users to select individual participants and visualize their personality evolution through a temporal timeline synced with their message history.
Hugging Face Model: mbti-co-model

Demo Screenshots

Performance Overview

The current model (v2) implements message aggregation and advanced loss functions to achieve high precision in personality classification.

Aggregate Metrics

Metric	Score
Accuracy	82.8%
Precision	82.7%
Recall	82.8%
F1 Score	82.4%

Per-Class Performance

Type	Precision	Recall	F1-Score
ENFJ	0.963	1.000	0.981
INFJ	0.987	0.974	0.981
INTJ	0.974	0.962	0.968
ENTP	0.886	0.933	0.909
ENFP	0.929	0.878	0.903
INTP	0.905	0.893	0.899
ISFP	0.798	0.986	0.882
ISTP	0.825	0.930	0.874
ESFJ	0.831	0.885	0.857
ISFJ	0.823	0.708	0.761
ESFP	0.705	0.796	0.748
INFP	0.662	0.818	0.732
ISTJ	0.729	0.683	0.705
ESTJ	0.712	0.675	0.693
ENTJ	0.672	0.558	0.610
ESTP	0.714	0.507	0.593

Evaluation Visualizations

Technical Implementation

Core Architecture

Base Model: XLM-RoBERTa (Multilingual Transformer)
Framework: PyTorch and Hugging Face Transformers
Loss Function: Focal Loss with label smoothing (optimized for class imbalance)
Training Strategy: Message aggregation (8 messages per sample) with [SEP] delimiters
Optimization: AdamW with Cosine Learning Rate Decay and gradient clipping

Documentation

Technical Notes: Architecture (Notion)
Visual Breakdown: Model Architecture Diagram (PDF)

Methodology Improvements

The transition from v1 to v2 resulted in an accuracy increase from 47.7% to 82.8% by implementing:

Message Aggregation: Context-aware samples providing richer linguistic signals.
Extended Context: Increased sequence length to 256 tokens.
Noisy Label Mitigation: Implementation of label smoothing to handle LLM-generated training data.
Standardized Cleaning: Unified preprocessing pipeline for training and inference.

Operational Workflow

1. Environment Configuration

Install required dependencies and configure the environment:

pip install -r requirements.txt

Create a .env file in the project root to store the API credentials:

GEMINI_API_KEY_1=your_api_key_here

2. Data Preparation Pipeline

Before training, raw WhatsApp data must be translated, labeled, cleaned, and balanced.

Translation and Labeling

Generate a labeled dataset using the Gemini API:

python -m model.data_prep.translate_msgs

This processes data/chatBoys.txt and generates data/translated_chat.csv.

Cleaning and Preprocessing

Clean the generated CSV and normalize text:

python -m model.data_prep.clean_csv

Dataset Balancing

Ensure even distribution across MBTI classes to prevent model bias:

python -m model.data_prep.balance_data

3. Model Training

Execute the main training script. This script handles message aggregation (8 per sample), 70/30 train/test splitting, and model fine-tuning:

python -m model.training.train

The trained model weights and tokenizer will be saved to model/output/mbti_model/.

4. Model Evaluation

Generate comprehensive performance metrics and visualizations on the held-out test set:

python -m model.evaluation.evaluate

Visual results (Confusion Matrix, ROC Curves) are saved to model/evaluation/results/.

5. Local Inference and Testing

Test model performance on specific inputs or full chat files from the command line.

Single Batch Prediction

Test a specific chunk of messages (separated by [SEP]) to see top-5 predictions:

python -m model.inference.predict

Full Chat Timeline Analysis

Run inference on a complete WhatsApp .txt export to generate a personality timeline:

python -m model.inference.infer

6. Running the Full Application

Deploy the complete MBTI Co. with a visual interface.

Backend Service

Start the Flask API to handle requests and manage inference state:

cd backend
python app.py

The backend initializes the model and serves endpoints at http://localhost:5000.

Frontend Interface

Start the React development server to visualize timelines and message contexts:

cd frontend
npm install
npm run dev

Access the dashboard at http://localhost:5173.

Key Features

Dynamic Timeline Generation: 2-month interval analysis of personality shifts.
Multilingual Support: Optimized for Roman Urdu and English code-switching.
Sentiment Filtering: Automatic removal of neutral or uninformative messages to reduce noise.
Detailed Metrics: Comprehensive evaluation suite including Confusion Matrices and ROC curves.

Academic Context

Developed as part of a Artificial Intelligence curriculum. Character assets and theoretical frameworks are used for educational research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.idea		.idea
backend		backend
demo		demo
docs		docs
frontend		frontend
model		model
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MBTI Co. Timeline Predictor

Demo Screenshots

Performance Overview

Aggregate Metrics

Per-Class Performance

Evaluation Visualizations

Technical Implementation

Core Architecture

Documentation

Methodology Improvements

Operational Workflow

1. Environment Configuration

2. Data Preparation Pipeline

Translation and Labeling

Cleaning and Preprocessing

Dataset Balancing

3. Model Training

4. Model Evaluation

5. Local Inference and Testing

Single Batch Prediction

Full Chat Timeline Analysis

6. Running the Full Application

Backend Service

Frontend Interface

Key Features

Academic Context

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MBTI Co. Timeline Predictor

Demo Screenshots

Performance Overview

Aggregate Metrics

Per-Class Performance

Evaluation Visualizations

Technical Implementation

Core Architecture

Documentation

Methodology Improvements

Operational Workflow

1. Environment Configuration

2. Data Preparation Pipeline

Translation and Labeling

Cleaning and Preprocessing

Dataset Balancing

3. Model Training

4. Model Evaluation

5. Local Inference and Testing

Single Batch Prediction

Full Chat Timeline Analysis

6. Running the Full Application

Backend Service

Frontend Interface

Key Features

Academic Context

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages