🎣 Phishing Detection Framework

Compare LLM vs BERT for phishing email detection — with Flask web UI, batch analysis, and visualization tools.

🔥 Features

Feature	Description
Dual Model Support	Llama-Phishsense-1B (LLM) vs BERT-finetuned-phishing
Web GUI	Flask-based interface for real-time email analysis
Batch Analysis	Process JSONL datasets with full metrics
Visualization	ROC curves, confusion matrices, model comparisons
GPU Acceleration	CUDA support out of the box

📊 Model Performance

Kaggle Dataset (500 emails)

Metric	BERT	Llama
Accuracy	99.0%	TBD
Precision	100%	TBD
Recall	98.0%	TBD
F1-Score	98.9%	TBD
ROC-AUC	0.99	TBD

Real-World Data (40 emails)

Metric	BERT	Llama
Accuracy	55.0%	52.5%
Precision	52.6%	51.3%
Recall	100%	100%
F1-Score	68.9%	67.8%

💡 Key Insight: Both models show high recall (catch all phishing) but lower precision on real-world data, suggesting more diverse training data is needed.

🏗️ Project Structure

.
├── bert-finetuned-phishing/       # BERT model implementation
│   ├── full_analysis_with_bert_model.py
│   ├── webgui.py
│   └── results_*/                 # Evaluation results
│
├── llama_phish_demo/              # Llama model implementation
│   ├── full_analysis_with_llama_model.py
│   ├── webgui.py
│   └── results_*/                 # Evaluation results
│
├── examples/                      # Test datasets (JSONL format)
│   ├── phishing_mails.jsonl
│   ├── valid_mails.jsonl
│   └── kaggle_*.jsonl
│
├── comparisons/                   # Model comparison visualizations
│   ├── kaggle/
│   ├── realworld_data/
│   └── ai_generated/
│
├── comparison_visualizer.py       # Generate comparison charts
├── emailstojsonl.py              # CSV to JSONL converter
└── requirements.txt

🚀 Quick Start

1. Install

git clone https://github.com/YOUR_USERNAME/Ollama-Phishing-Framework.git
cd Ollama-Phishing-Framework
pip install -r requirements.txt

2. Run Web GUI

BERT Model (lighter, faster):

cd bert-finetuned-phishing
python webgui.py
# Open http://localhost:5000

Llama Model (requires HuggingFace token):

cd llama_phish_demo
python webgui.py
# Open http://localhost:5001

3. Run Batch Analysis

# BERT analysis
cd bert-finetuned-phishing
python full_analysis_with_bert_model.py

# Llama analysis  
cd llama_phish_demo
python full_analysis_with_llama_model.py

4. Generate Comparison Charts

python comparison_visualizer.py

📈 Sample Visualizations

After running comparison_visualizer.py, you'll get:

Model Performance Comparison

Confusion Matrix Comparison

ROC Curve Comparison

📁 Data Format

JSONL with two fields:

{"email_type": "phishing", "content": "Your account has been compromised..."}
{"email_type": "valid", "content": "Meeting reminder for tomorrow at 3pm"}

Convert CSV to JSONL:

python emailstojsonl.py

🤝 Contributing

PRs welcome!

📚 Models Used

Model	Source	Author
Llama-Phishsense-1B	HuggingFace	AcuteShrewdSecurity
bert-finetuned-phishing	HuggingFace	E. Alvarado

⚠️ Disclaimer

Educational purposes only. Do not use for malicious activities.

📄 License

MIT

⭐ Star this repo if you find it useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎣 Phishing Detection Framework

🔥 Features

📊 Model Performance

Kaggle Dataset (500 emails)

Real-World Data (40 emails)

🏗️ Project Structure

🚀 Quick Start

1. Install

2. Run Web GUI

3. Run Batch Analysis

4. Generate Comparison Charts

📈 Sample Visualizations

Model Performance Comparison

Confusion Matrix Comparison

ROC Curve Comparison

📁 Data Format

🤝 Contributing

📚 Models Used

⚠️ Disclaimer

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bert-finetuned-phishing		bert-finetuned-phishing
comparisons		comparisons
examples		examples
llama_phish_demo		llama_phish_demo
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
comparison_visualizer.py		comparison_visualizer.py
emails.csv		emails.csv
emailstojsonl.py		emailstojsonl.py
requirements.txt		requirements.txt

jovicdev97/Ollama-Phishing-Framework

Folders and files

Latest commit

History

Repository files navigation

🎣 Phishing Detection Framework

🔥 Features

📊 Model Performance

Kaggle Dataset (500 emails)

Real-World Data (40 emails)

🏗️ Project Structure

🚀 Quick Start

1. Install

2. Run Web GUI

3. Run Batch Analysis

4. Generate Comparison Charts

📈 Sample Visualizations

Model Performance Comparison

Confusion Matrix Comparison

ROC Curve Comparison

📁 Data Format

🤝 Contributing

📚 Models Used

⚠️ Disclaimer

📄 License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages