Phishing URL Detection using Machine Learning

Overview

Phishing attacks remain one of the most common and effective cyber threats, often relying on deceptively crafted URLs to trick users into revealing sensitive information. This project implements an end-to-end phishing URL detection system using machine learning, exposed through a lightweight Flask web application.

Users can submit any URL and receive a real-time classification (legitimate or phishing) along with confidence scores.

Live Demo

Access the deployed application here:
https://phishing-url-detector-kn1b.onrender.com

Note: The application may take up to ~1 minute to load on first access due to cold start on free hosting.

Key Features

Machine learning–based phishing detection using URL text only
TF-IDF feature extraction at both word-level and character-level
Linear Support Vector Classifier (LinearSVC) with calibrated probability outputs
Confidence-aware predictions displayed through the web interface
Flask-based web interface for real-time interaction
Deployment-ready for platforms such as Render or GitHub-hosted environments

Dataset and Model

Dataset

Labeled dataset of legitimate and phishing URLs
Source: https://huggingface.co/datasets/pirocheto/phishing-url
The dataset contains a diverse mix of legitimate and phishing URLs and is commonly used as a benchmark for URL-based phishing detection tasks.

Model Pipeline

TF-IDF Vectorization
- Word-level n-grams
- Character-level n-grams (effective for obfuscated and shortened URLs)
Classifier
- Linear Support Vector Machine (LinearSVC)
Calibration
- Probability calibration applied to enable meaningful confidence estimates

The trained model is serialized and loaded at runtime for efficient inference.

Project Structure

phishing-url-detector/
├── app.py             # Main Flask server
├── templates/
│ └── index.html       # Frontend HTML
├── model/
│ └── model.pkl        # Trained machine learning model
├── helpers.py         # Model training and utilities
├── requirements.txt   # Python dependencies
└── README.md          # Project documentation

Getting Started

Prerequisites

Python 3.8 or higher
pip for dependency installation

Installation and Running

Clone the repository and install dependencies:

git clone https://github.com/yourusername/phishing-url-detector.git
cd phishing-url-detector
pip install -r requirements.txt

Run the web app:

python app.py

Then open http://127.0.0.1:5000 in your browser.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
model		model
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
helper.py		helper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phishing URL Detection using Machine Learning

Overview

Live Demo

Key Features

Dataset and Model

Dataset

Model Pipeline

Project Structure

Getting Started

Prerequisites

Installation and Running

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

aarxshi/phishing-url-detector

Folders and files

Latest commit

History

Repository files navigation

Phishing URL Detection using Machine Learning

Overview

Live Demo

Key Features

Dataset and Model

Dataset

Model Pipeline

Project Structure

Getting Started

Prerequisites

Installation and Running

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages