Real-Time AI/ML-Based Phishing Detection and Prevention System

A comprehensive, enterprise-grade phishing detection and prevention platform leveraging advanced machine learning models, real-time analysis, and threat intelligence integration to protect organizations from sophisticated phishing attacks.

Solution Overview

The Real-Time AI/ML-Based Phishing Detection and Prevention System is a microservices-based platform designed to detect and prevent phishing attacks through multi-layered analysis. The system combines natural language processing, graph neural networks, computer vision, and threat intelligence to provide comprehensive protection against evolving phishing threats.

The platform analyzes emails, URLs, and web content using an ensemble of machine learning models to identify phishing attempts with high accuracy. It provides real-time threat detection with sub-50ms latency, integrates with threat intelligence feeds, and offers continuous learning capabilities to adapt to new attack patterns.

Key Use Cases:

Enterprise email security and filtering
Browser-based threat protection
API-based threat analysis for security tools
Real-time threat monitoring and alerting
Security research and threat intelligence

Features

Core Detection Capabilities

Text Analysis (NLP)

Phishing detection using fine-tuned BERT/RoBERTa models
AI-generated content detection
Urgency and sentiment analysis
Social engineering indicator identification
Email parsing and header analysis
Multi-language support

URL and Domain Analysis

Graph Neural Network-based domain relationship analysis
Redirect chain tracking and analysis
Homoglyph and typosquatting detection
WHOIS and DNS record analysis
SSL certificate validation
Domain reputation scoring

Visual Analysis

CNN-based brand impersonation detection
DOM structure analysis
Visual similarity matching
Logo and form field detection
Screenshot-based comparison

Real-Time Features

Sub-50ms Detection Latency: Optimized for real-time threat detection
WebSocket Support: Real-time event streaming to clients
Parallel Processing: Simultaneous analysis across multiple ML services
Caching Strategy: Redis-based caching for frequently accessed data
Async Processing: Background jobs for heavy analysis tasks

Threat Intelligence

IOC Management: Comprehensive Indicators of Compromise database
Feed Integration: MISP and AlienVault OTX synchronization
Bloom Filter Lookups: Fast IOC matching with probabilistic data structures
Custom Feeds: Support for custom threat intelligence sources
IOC Enrichment: Automatic enrichment with metadata and context

Continuous Learning

Automated Training Pipeline: Scheduled model retraining from new data
Drift Detection: Automatic detection of model performance degradation
A/B Testing: Model version comparison and gradual rollout
Feedback Loop: User feedback integration for model improvement
Feature Store: Centralized feature management for training

Browser Extension

Real-Time URL Checking: Instant threat detection while browsing
Email Scanning: Integration with email clients for message analysis
Privacy-Preserving: Local caching and minimal data transmission
Blocking Mechanisms: Automatic blocking of confirmed threats
User Reporting: Easy reporting of suspicious content

Sandbox Analysis

Dynamic Analysis: Behavioral analysis of links and attachments
Multi-Engine Support: Integration with Cuckoo, Any.run, and custom sandboxes
Result Correlation: Integration with detection signals
Automated Submission: Queue-based processing for sandbox jobs
Comprehensive Reporting: Detailed behavioral analysis reports

Technology Stack

Frontend

Framework: Next.js 16.0.10 (App Router)
UI Library: React 19.2.0
Language: TypeScript 5.x
Styling: Tailwind CSS 4.1.9
Components: Radix UI
Charts: Recharts 2.15.4
State Management: React Hooks
Theming: next-themes (dark/light mode support)

Backend Services

API Gateway & Core Services:

Runtime: Node.js 20+
Framework: Express.js 4.18.2
Language: TypeScript 5.3.3
Authentication: API key-based with bcrypt
Rate Limiting: Custom middleware with Redis
WebSocket: Socket.io 4.5.4

ML Services:

Runtime: Python 3.11+
Framework: FastAPI 0.104.1
Server: Uvicorn 0.24.0
Validation: Pydantic 2.5.0

Machine Learning & AI

NLP: Transformers 4.35.0, Sentence Transformers 2.2.2
Deep Learning: PyTorch 2.1.0
Graph Networks: PyTorch Geometric 2.4.0
Computer Vision: Torchvision 0.16.0, OpenCV 4.8.1
ML Utilities: scikit-learn 1.3.2, NumPy 1.24.3
NLP Tools: NLTK 3.8.1, spaCy 3.7.2

Databases

PostgreSQL 15: Primary relational database (20 tables)
MongoDB 7: Document store for ML analysis results
Redis 7: Caching, queues (BullMQ), rate limiting

Infrastructure

Cloud Provider: AWS (ap-south-1)
Infrastructure as Code: Terraform 1.5.0
Containerization: Docker, Docker Compose
Orchestration: AWS ECS Fargate
Load Balancing: Application Load Balancer
Service Discovery: AWS Cloud Map
Storage: AWS S3 (models, training data, logs)
CI/CD: GitHub Actions

DevOps & Monitoring

CI/CD: GitHub Actions workflows
Logging: CloudWatch Logs, Winston
Monitoring: CloudWatch Metrics
Security Scanning: Trivy
Version Control: Git

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Frontend (Next.js 16)                        │
│  Threat Dashboard | Real-time Monitor | Threat Intelligence     │
└────────────────────────────┬────────────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────────┐
│                    API Gateway (Node.js/TypeScript)              │
│  Routing | Authentication | Rate Limiting | Request Logging     │
└───────┬───────────────┬───────────────┬──────────────────────────┘
        │               │               │
┌───────▼──────┐  ┌─────▼──────┐  ┌───▼──────────────┐
│ Detection API│  │Threat Intel│  │  Extension API   │
│  (Orchestrator)│  │  Service   │  │   Service        │
└───────┬──────┘  └────────────┘  └──────────────────┘
        │
    ┌───┴───┬──────────┬──────────┬──────────┐
    │       │          │          │          │
┌───▼───┐ ┌─▼───┐ ┌───▼───┐ ┌────▼────┐ ┌────▼────┐
│ NLP   │ │URL  │ │Visual │ │Sandbox  │ │Learning │
│Service│ │Svc  │ │Svc    │ │Service  │ │Pipeline │
└───┬───┘ └─┬───┘ └───┬───┘ └────┬────┘ └────┬────┘
    │       │          │          │          │
    └───────┴──────────┴──────────┴──────────┘
            │          │          │
    ┌───────▼──────────▼──────────▼──────────┐
    │     PostgreSQL | MongoDB | Redis      │
    └────────────────────────────────────────┘

Microservices

ML Services (Python/FastAPI):

NLP Service: Text analysis using transformer models
URL Service: Domain and URL analysis using Graph Neural Networks
Visual Service: Visual pattern recognition using CNNs

Core Services (Node.js/TypeScript):

API Gateway: Entry point with authentication and routing
Detection API: Orchestrates ML services for threat detection
Threat Intelligence Service: IOC management and feed synchronization
Extension API: Browser extension backend integration
Sandbox Service: Dynamic analysis of suspicious content
Learning Pipeline: Automated model training and deployment

Quick Start (Docker Compose) — run locally

Run the full stack (frontend + backend + ML services + databases) locally with Docker Compose:

# 1. Copy environment file and set required variables
cp .env.example .env
# Edit .env and set POSTGRES_PASSWORD to a secure value

# 2. Set up ML models (REQUIRED before first run)
./scripts/setup-ml-models.sh

# 3. Build and start all services
./scripts/start-local.sh
# Or: docker compose up --build
# Background: docker compose up --build -d

# Access:
# - Frontend:    http://localhost:3080
# - API Gateway: http://localhost:3000
# - Detection:   http://localhost:3001
# - NLP:         http://localhost:8000
# - URL:         http://localhost:8001
# - Visual:      http://localhost:8002

Check status: docker compose ps. Stop: docker compose down.

API key for smoke/integration tests: On first run, the database seeds a test API key. Use TEST_API_KEY=testkey_smoke_test_12345 when running ./scripts/smoke-test.sh or integration tests. See docs/DEPLOYMENT_RUNBOOK.md for details.

The root docker-compose.yml includes the backend stack and adds the Next.js frontend. Backend services (API gateway, detection API, threat intel, ML services) run on their standard ports. See docs/DEPLOYMENT_RUNBOOK.md for detailed configuration and environment variables.

Project Status

See docs/PROJECT_COMPLETION_STATUS.md for a full completion checklist.

Quick status: Backend services, frontend pages, browser extension, Docker setup, CI/CD, and documentation are complete. E2E smoke tests (npm run test:e2e) and sandbox disabled-state UX are implemented.

License

made by harshdeep:/

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.cursor/plans		.cursor/plans
.github/workflows		.github/workflows
.vscode		.vscode
app		app
backend		backend
components		components
docs		docs
e2e		e2e
extensions		extensions
hooks		hooks
lib		lib
notebooks/colab		notebooks/colab
public		public
scripts		scripts
styles		styles
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time AI/ML-Based Phishing Detection and Prevention System

Solution Overview

Features

Core Detection Capabilities

Real-Time Features

Threat Intelligence

Continuous Learning

Browser Extension

Sandbox Analysis

Technology Stack

Frontend

Backend Services

Machine Learning & AI

Databases

Infrastructure

DevOps & Monitoring

Architecture

Microservices

Quick Start (Docker Compose) — run locally

Project Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time AI/ML-Based Phishing Detection and Prevention System

Solution Overview

Features

Core Detection Capabilities

Real-Time Features

Threat Intelligence

Continuous Learning

Browser Extension

Sandbox Analysis

Technology Stack

Frontend

Backend Services

Machine Learning & AI

Databases

Infrastructure

DevOps & Monitoring

Architecture

Microservices

Quick Start (Docker Compose) — run locally

Project Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages