Table of Contents:
- 🚀 Overview
- 📺 Demo
- ⚙️ Installation
- 📊 Dataset
- 🧹 Project Modules
- 🤖 AI Models
- 🧪 Experiments & Evaluation
- 🏅 Training Approach
- 📊 Ensemble Voting
- 📚 Notebooks & Further Reading
JudgerAI is an innovative NLP application that predicts legal case outcomes with impressive accuracy by analyzing past cases, precedents, and case facts. It empowers legal professionals to:
- 📈 Increase prediction accuracy
- ⏱️ Save valuable time on case research
- 🧠 Make informed, data-driven decisions
Watch JudgerAI in action:
JudgerAI.2.0.-.Project.Demo.-.Trim.mp4
1️⃣ Clone the repository
git clone https://github.com/MohammedAly22/JudgerAI.git2️⃣ Download GloVe embeddings (50-dim) from Kaggle and save as:
./GloVe/glove.6B.50d.txt
3️⃣ Download pre-trained models (heavy files) and place them in models/.
Download Models from Here
4️⃣ Directory structure:
JudgerAI/
├── csvs/
├── dataset/
├── GloVe/
├── models/
├── src/
└── *.ipynb
5️⃣ Run the app:
streamlit run src/main.py- Total cases: 3,464
- Key columns: ID, name, href, first/second_party, winning_party, winner_index (0/1), facts
- Input:
facts→ Output:winner_index
Here is the dataset summary:
| column | datatype | description |
|---|---|---|
ID |
int64 | Defines the case ID |
name |
string | Defines the case name |
href |
string | Defines the case hyper-reference |
first_party |
string | Defines the name of the first party (petitioner) of a case |
second_party |
string | Defines the name of the second party (respondent) of a case |
winning_party |
string | Defines the winning party name of a case |
winner_index |
int64 | Defines the winning index of a case, 0 => the first party wins, 1 => the second party wins |
facts |
string | Contains the case facts that are needed to determine who is the winner of a specific case |
Modular structure for maintainability and clarity:
| Module | Location | Responsibilities |
|---|---|---|
| Preprocessing | src/preprocessing.py |
Tokenization, balancing, anonymization, vectorization |
| Plotting | src/plotting.py |
Visualizing performance, confusion matrices, ROC-AUC, heatmaps |
| Utils | src/utils.py |
Training helpers, k-fold CV, accuracy/loss summary builders |
| Streamlit App | src/main.py |
Frontend UI for demo and deployment |
| Deployment Utils | src/deployment_utils.py |
Model loader, sample picker, vectorizer generator, highlights words |
JudgerAI incorporates 7 different models:
- Doc2Vec – Documents as dense vectors
- 1D-CNN – Convolutional features over text
- TF-IDF + TextVectorization – Weighted bag-of-words embedding
- GloVe – Global co-occurrence embeddings
- FastText – Subword-enhanced embeddings
- LSTM – Memory-capable sequences
- BERT – Contextual pre-trained transformer
A mix of traditional and modern architectures to maximize coverage.
Three core preprocessing decisions were evaluated:
- Preprocessing steps – stopword removal, stemming, etc.
- Data anonymization – replacing party names with
_PARTY_ - Label imbalance – strategies for balanced classes
This results in 2³ = 8 experiments, each run with 4-fold cross-validation, giving thorough analysis across 32 total runs per model.
| Combination # | Preprocessing | Data Anonymization | Label Class Imbalance |
|---|---|---|---|
| 1 | No | No | No |
| 2 | No | No | Yes |
| 3 | No | Yes | No |
| 4 | No | Yes | Yes |
| 5 | Yes | No | No |
| 6 | Yes | No | Yes |
| 7 | Yes | Yes | No |
| 8 | Yes | Yes | Yes |
- 80/20 train/test split
- 4-fold CV on training set
- Best combination selected per model based on accuracy
Workflow:
- Train each model × 8 preprocessing setups × 4 CV folds
- Evaluate and select best-performing model
Final predictions are generated through an ensemble voting method across all tuned models to ensure robustness.
In-depth exploration available in the following notebooks:
BERT_experiments.ipynbcnn_experiments.ipynbdoc2vec_experiments.ipynbFastText_experiments.ipynbglove_experiments.ipynbLSTM_experiments.ipynbtf_idf_experiments.ipynbvoting_experiments.ipynb
Contributions welcome! Feel free to contribute to JudgerAI.
Thank you for exploring JudgerAI—ushering in a smarter future for legal decision-making with AI-powered precision.

