Skip to content

A machine-learning–based phishing URL detector built using 87 handcrafted lexical, content-based, and external features. Includes full feature extraction, model training, evaluation, and a Flask API for real-time URL classification.

License

Notifications You must be signed in to change notification settings

aarxshi/phishing-url-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phishing URL Detection using Machine Learning

Overview

Phishing attacks remain one of the most common and effective cyber threats, often relying on deceptively crafted URLs to trick users into revealing sensitive information. This project implements an end-to-end phishing URL detection system using machine learning, exposed through a lightweight Flask web application.

Users can submit any URL and receive a real-time classification (legitimate or phishing) along with confidence scores.


Live Demo

Access the deployed application here:
https://phishing-url-detector-kn1b.onrender.com

Note: The application may take up to ~1 minute to load on first access due to cold start on free hosting.


Key Features

  • Machine learning–based phishing detection using URL text only
  • TF-IDF feature extraction at both word-level and character-level
  • Linear Support Vector Classifier (LinearSVC) with calibrated probability outputs
  • Confidence-aware predictions displayed through the web interface
  • Flask-based web interface for real-time interaction
  • Deployment-ready for platforms such as Render or GitHub-hosted environments

Dataset and Model

Dataset

Model Pipeline

  • TF-IDF Vectorization
    • Word-level n-grams
    • Character-level n-grams (effective for obfuscated and shortened URLs)
  • Classifier
    • Linear Support Vector Machine (LinearSVC)
  • Calibration
    • Probability calibration applied to enable meaningful confidence estimates

The trained model is serialized and loaded at runtime for efficient inference.


Project Structure

phishing-url-detector/
├── app.py             # Main Flask server
├── templates/
│ └── index.html       # Frontend HTML
├── model/
│ └── model.pkl        # Trained machine learning model
├── helpers.py         # Model training and utilities
├── requirements.txt   # Python dependencies
└── README.md          # Project documentation

Getting Started

Prerequisites

  • Python 3.8 or higher
  • pip for dependency installation

Installation and Running

Clone the repository and install dependencies:

git clone https://github.com/yourusername/phishing-url-detector.git
cd phishing-url-detector
pip install -r requirements.txt

Run the web app:

python app.py

Then open http://127.0.0.1:5000 in your browser.

About

A machine-learning–based phishing URL detector built using 87 handcrafted lexical, content-based, and external features. Includes full feature extraction, model training, evaluation, and a Flask API for real-time URL classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages