A phishing URL detection application using machine learning, built with Starlette framework.
- Features
- Prerequisites
- Installation
- Configuration
- Running the Application
- Project Structure
- API Endpoints
- Development
- License
- Support
- URL Analysis: Advanced phishing detection using machine learning
- Feature Extraction: Comprehensive URL feature analysis including:
- Address bar-based features
- Domain-based features
- Content-based features
- Modern API Framework: Built with Starlette for high performance and async support
- API Documentation: Automatic OpenAPI/Swagger documentation
- Internationalization: Multi-language support (English and Spanish)
- Web Interface: Clean and intuitive UI for URL analysis
- Real-time Analysis: Immediate feedback on URL legitimacy
- Detailed Reports: Comprehensive feature analysis for each URL check
- Python 3.10+
- pip (Python package manager)
- Clone the repository:
git clone <repository-url>
cd phishing-url-detector- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Copy the environment template:
cp .env.example .envConfigure your .env file with appropriate values:
# Server
HTTP_SCHEMA=http
HOST=localhost
PROD=False
PORT=8000
# API Spec
OPENAPI_TITLE=Phishing URL Detection API
OPENAPI_DESCRIPTION=A phishing URL detection API using machine learning.
OPENAPI_VERSION=0.0.1- Start the development server:
python main.pyAssuming the default configuration, the application will be available at:
- Web Interface: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Interactive web interface for URL analysis with real-time results:
Select your preferred language:
Comprehensive API documentation with Swagger UI:
├── core/ # Core functionality
├── data/ # Data files
├── dtos/ # Data Transfer Objects
├── extractors/ # URL feature extractors
├── lib/ # Libraries and utilities
├── locales/ # Translation files
├── middlewares/ # Middleware components
├── models/ # ML models and data structures
├── notebooks/ # Jupyter notebooks for ML training
├── routers/ # API routes
├── services/ # Business logic
├── static/ # Static files
├── templates/ # HTML templates
├── tests/ # Test suite
└── utils/ # Utility functions
POST /predict- Analyzes a URL for phishing characteristics
- Request body:
{"url": "https://example.com"} - Response: Prediction results with detailed feature analysis
The model is trained using various URL features, such as:
- URL length
- Domain characteristics
- Content analysis
Training notebooks are available in the notebooks/ directory.
Features are extracted using the URLFeaturesExtractor class, which analyzes:
- Address bar features
- Domain-based features
- Content-based features
Supports multiple languages through JSON locale files:
- English (
en.json) - Spanish (
es.json)
Run the test suite:
pytestCoverage reports are automatically generated through GitHub Actions.
This project is licensed under the MIT License. See the LICENSE file for details.
If you find this project useful, give it a ⭐ on GitHub!








