PersonaFlow is an MBTI timeline predictor that analyzes personal or group "WhatsApp" chat conversations to predict personality types over time. Users can upload WhatsApp chat exports, select a person from the conversation, and visualize their MBTI personality timeline along with their messages.
- Base Model: XLM-RoBERTa (Encoder-only Transformer)
- Framework: Hugging Face Transformers
- Loss Function: Focal Loss (handles class imbalance)
- Optimizer: AdamW with gradient clipping
- Learning Rate: 2e-5 with linear decay
- Gradient Accumulation: Enabled for stable training
- Batch Size: 8 (with gradient accumulation)
- Epochs: 10
- Max Sequence Length: 128 tokens (because usually messages are in short phrases)
- Evaluation Metrics: Macro-averaged F1, Precision, Recall, and Accuracy
- Framework: React + Vite
- Styling: Tailwind CSS
- Framework: Python Flask
- ML Integration: PyTorch with Transformers
- API: Gemini API
Based on evaluation on validation set:
MBTI Model Classification Report
------------------------------------------------------
precision recall f1-score support
ENFJ 0.6785 0.7367 0.7064 2100
ENFP 0.5349 0.3610 0.4310 2100
ENTJ 0.2772 0.3876 0.3232 2100
ENTP 0.5822 0.4990 0.5374 2100
ESFJ 0.5499 0.4933 0.5201 2100
ESFP 0.4450 0.5071 0.4741 2100
ESTJ 0.3504 0.4738 0.4028 2100
ESTP 0.4338 0.4305 0.4321 2100
INFJ 0.6821 0.7081 0.6949 2100
INFP 0.5571 0.6314 0.5920 2100
INTJ 0.7140 0.6133 0.6598 2100
INTP 0.4911 0.5510 0.5193 2100
ISFJ 0.4318 0.4614 0.4461 2100
ISFP 0.5039 0.3671 0.4248 2100
ISTJ 0.4508 0.4562 0.4535 2100
ISTP 0.6238 0.3997 0.4872 2099
accuracy 0.5048 33599
macro avg 0.5192 0.5048 0.5065 33599
weighted avg 0.5192 0.5048 0.5066 33599
Overall Performance:
- Accuracy: 60%
- Macro F1-Score: 0.59
- Macro Precision: 0.60
- Macro Recall: 0.60
Best Performing Types:
- INTJ (Architect): F1-Score 0.66, Precision 0.72
- ENFJ (Protagonist): F1-Score 0.71, Precision 0.68
- INFJ (Advocate): F1-Score 0.70, Precision 0.69
Challenging Types:
- ENTJ (Commander): F1-Score 0.35
- ISFP (Adventurer): F1-Score 0.43
- ESTP (Entrepreneur): F1-Score 0.45
Key Insights:
- Model achieves balanced performance across 16 MBTI types
- Focal Loss successfully handles class imbalance (Gamma=2.0, Alpha=0.25)
- Balanced training dataset using oversampling techniques
- Gradient norm clipping at 1.0 ensures stable training
- The model performs well on Introverted Intuitive types (INTJ, INFJ)
- CUDA-enabled GPU required (tested on RTX 4050 6GB VRAM)
- PyTorch with CUDA support must be installed
- Training takes approximately 10 hours on RTX 4050
Create a .env file in the root directory with your Gemini API credentials:
GEMINI_API_KEY=your_gemini_api_key_here- Export a chat from WhatsApp (Contact or Group)
- Save the
.txtfile todata/folder - Example path:
data/xyz.txt
Navigate to model/Messages/ and run:
python msgs.pyOutput: data/translated_chat.csv with format:
roman_urdu,english,mbti
"Bro kidr ho?","Bro, where are you?","ESTP"
"You deleted this message","You deleted this message","ISTJ"Navigate to model/Train/ and run:
python cleanMBTI.pyOutput: Creates train.csv in model/Train/ directory
Run the balancing script to prevent overfitting:
python create_balanced_train.pyOutput: Balanced train.csv with equal representation of MBTI types
Clean up corrupted/multi-line entries:
python fix_csv_format.pyOutput: Formatted, clean train.csv ready for training
Start the fine-tuning process:
python trainMBTI.pyExpected Training Logs:
{'loss': 2.1942, 'grad_norm': 5.9785, 'learning_rate': 2.02e-05, 'epoch': 1.82}
{'loss': 2.6711, 'grad_norm': 3.9611, 'learning_rate': 2.00e-05, 'epoch': 1.83}
{'loss': 2.4304, 'grad_norm': 3.1066, 'learning_rate': 1.98e-05, 'epoch': 1.86}
...
{'eval_loss': 2.5381, 'eval_accuracy': 0.0708, 'eval_f1_macro': 0.0165, 'epoch': 3.0}
{'train_runtime': 6198.96s, 'train_samples_per_second': 17.628, 'epoch': 3.0}
100%|████████████████████████████| 13662/13662 [1:43:18<00:00, 2.20it/s]
Saving model to mbti_model...
Training complete.
Training Time: ~10 hours on RTX 4050 (6GB VRAM)
Output: Trained model saved to model/Train/mbti_model/
Test the trained model with sample input:
cd model/Infer
python main.pyAdjust the input file path in main.py to your test chat file.
cd backend
python app.pycd frontend
npm install
npm run devAccess the application at http://localhost:5173 (or the port shown in terminal)
- Chat Upload: Support for WhatsApp
.txtexport files - User Selection: Pick any participant from the conversation
- MBTI Timeline: Visualize personality changes over time
- Message Context: View messages alongside MBTI predictions
- Gender-Specific Avatars: 32 unique character designs (16 MBTI types × 2 genders)
12/7/25, 10:30 AM - Ahmed: Hey, how are you?
12/7/25, 10:31 AM - Hassan: I'm good! Working on a project.
roman_urdu,english,mbti
"Bro kidr ho?","Bro, where are you?","ESTP"
"Theek hu","I'm fine","ISTJ"© This project is developed as an academic exercise for the Natural Language & Processing course.
Attribution:
- Character PNGs sourced from 16Personalities.com for educational purposes
- Additional some characters are generated using Gemini Nano
- Address class imbalance with advanced sampling techniques
- Implement ensemble methods for better accuracy
- Add support for more languages beyond Roman Urdu/English
- Confidence scores for predictions
This is an academic project. Contributions, suggestions, and feedback are welcome!
Educational use only. See attribution section for third-party resources.
Enjoy exploring personality timelines with PersonaFlow!













