Skip to content

HarshdeepAthawale/Real-Time-AI-ML-Based-Phishing-Detection-and-Prevention-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time AI/ML-Based Phishing Detection and Prevention System

A comprehensive, enterprise-grade phishing detection and prevention platform leveraging advanced machine learning models, real-time analysis, and threat intelligence integration to protect organizations from sophisticated phishing attacks.

Solution Overview

The Real-Time AI/ML-Based Phishing Detection and Prevention System is a microservices-based platform designed to detect and prevent phishing attacks through multi-layered analysis. The system combines natural language processing, graph neural networks, computer vision, and threat intelligence to provide comprehensive protection against evolving phishing threats.

The platform analyzes emails, URLs, and web content using an ensemble of machine learning models to identify phishing attempts with high accuracy. It provides real-time threat detection with sub-50ms latency, integrates with threat intelligence feeds, and offers continuous learning capabilities to adapt to new attack patterns.

Key Use Cases:

  • Enterprise email security and filtering
  • Browser-based threat protection
  • API-based threat analysis for security tools
  • Real-time threat monitoring and alerting
  • Security research and threat intelligence

Features

Core Detection Capabilities

Text Analysis (NLP)

  • Phishing detection using fine-tuned BERT/RoBERTa models
  • AI-generated content detection
  • Urgency and sentiment analysis
  • Social engineering indicator identification
  • Email parsing and header analysis
  • Multi-language support

URL and Domain Analysis

  • Graph Neural Network-based domain relationship analysis
  • Redirect chain tracking and analysis
  • Homoglyph and typosquatting detection
  • WHOIS and DNS record analysis
  • SSL certificate validation
  • Domain reputation scoring

Visual Analysis

  • CNN-based brand impersonation detection
  • DOM structure analysis
  • Visual similarity matching
  • Logo and form field detection
  • Screenshot-based comparison

Real-Time Features

  • Sub-50ms Detection Latency: Optimized for real-time threat detection
  • WebSocket Support: Real-time event streaming to clients
  • Parallel Processing: Simultaneous analysis across multiple ML services
  • Caching Strategy: Redis-based caching for frequently accessed data
  • Async Processing: Background jobs for heavy analysis tasks

Threat Intelligence

  • IOC Management: Comprehensive Indicators of Compromise database
  • Feed Integration: MISP and AlienVault OTX synchronization
  • Bloom Filter Lookups: Fast IOC matching with probabilistic data structures
  • Custom Feeds: Support for custom threat intelligence sources
  • IOC Enrichment: Automatic enrichment with metadata and context

Continuous Learning

  • Automated Training Pipeline: Scheduled model retraining from new data
  • Drift Detection: Automatic detection of model performance degradation
  • A/B Testing: Model version comparison and gradual rollout
  • Feedback Loop: User feedback integration for model improvement
  • Feature Store: Centralized feature management for training

Browser Extension

  • Real-Time URL Checking: Instant threat detection while browsing
  • Email Scanning: Integration with email clients for message analysis
  • Privacy-Preserving: Local caching and minimal data transmission
  • Blocking Mechanisms: Automatic blocking of confirmed threats
  • User Reporting: Easy reporting of suspicious content

Sandbox Analysis

  • Dynamic Analysis: Behavioral analysis of links and attachments
  • Multi-Engine Support: Integration with Cuckoo, Any.run, and custom sandboxes
  • Result Correlation: Integration with detection signals
  • Automated Submission: Queue-based processing for sandbox jobs
  • Comprehensive Reporting: Detailed behavioral analysis reports

Technology Stack

Frontend

  • Framework: Next.js 16.0.10 (App Router)
  • UI Library: React 19.2.0
  • Language: TypeScript 5.x
  • Styling: Tailwind CSS 4.1.9
  • Components: Radix UI
  • Charts: Recharts 2.15.4
  • State Management: React Hooks
  • Theming: next-themes (dark/light mode support)

Backend Services

API Gateway & Core Services:

  • Runtime: Node.js 20+
  • Framework: Express.js 4.18.2
  • Language: TypeScript 5.3.3
  • Authentication: API key-based with bcrypt
  • Rate Limiting: Custom middleware with Redis
  • WebSocket: Socket.io 4.5.4

ML Services:

  • Runtime: Python 3.11+
  • Framework: FastAPI 0.104.1
  • Server: Uvicorn 0.24.0
  • Validation: Pydantic 2.5.0

Machine Learning & AI

  • NLP: Transformers 4.35.0, Sentence Transformers 2.2.2
  • Deep Learning: PyTorch 2.1.0
  • Graph Networks: PyTorch Geometric 2.4.0
  • Computer Vision: Torchvision 0.16.0, OpenCV 4.8.1
  • ML Utilities: scikit-learn 1.3.2, NumPy 1.24.3
  • NLP Tools: NLTK 3.8.1, spaCy 3.7.2

Databases

  • PostgreSQL 15: Primary relational database (20 tables)
  • MongoDB 7: Document store for ML analysis results
  • Redis 7: Caching, queues (BullMQ), rate limiting

Infrastructure

  • Cloud Provider: AWS (ap-south-1)
  • Infrastructure as Code: Terraform 1.5.0
  • Containerization: Docker, Docker Compose
  • Orchestration: AWS ECS Fargate
  • Load Balancing: Application Load Balancer
  • Service Discovery: AWS Cloud Map
  • Storage: AWS S3 (models, training data, logs)
  • CI/CD: GitHub Actions

DevOps & Monitoring

  • CI/CD: GitHub Actions workflows
  • Logging: CloudWatch Logs, Winston
  • Monitoring: CloudWatch Metrics
  • Security Scanning: Trivy
  • Version Control: Git

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Frontend (Next.js 16)                        │
│  Threat Dashboard | Real-time Monitor | Threat Intelligence     │
└────────────────────────────┬────────────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────────┐
│                    API Gateway (Node.js/TypeScript)              │
│  Routing | Authentication | Rate Limiting | Request Logging     │
└───────┬───────────────┬───────────────┬──────────────────────────┘
        │               │               │
┌───────▼──────┐  ┌─────▼──────┐  ┌───▼──────────────┐
│ Detection API│  │Threat Intel│  │  Extension API   │
│  (Orchestrator)│  │  Service   │  │   Service        │
└───────┬──────┘  └────────────┘  └──────────────────┘
        │
    ┌───┴───┬──────────┬──────────┬──────────┐
    │       │          │          │          │
┌───▼───┐ ┌─▼───┐ ┌───▼───┐ ┌────▼────┐ ┌────▼────┐
│ NLP   │ │URL  │ │Visual │ │Sandbox  │ │Learning │
│Service│ │Svc  │ │Svc    │ │Service  │ │Pipeline │
└───┬───┘ └─┬───┘ └───┬───┘ └────┬────┘ └────┬────┘
    │       │          │          │          │
    └───────┴──────────┴──────────┴──────────┘
            │          │          │
    ┌───────▼──────────▼──────────▼──────────┐
    │     PostgreSQL | MongoDB | Redis      │
    └────────────────────────────────────────┘

Microservices

ML Services (Python/FastAPI):

  • NLP Service: Text analysis using transformer models
  • URL Service: Domain and URL analysis using Graph Neural Networks
  • Visual Service: Visual pattern recognition using CNNs

Core Services (Node.js/TypeScript):

  • API Gateway: Entry point with authentication and routing
  • Detection API: Orchestrates ML services for threat detection
  • Threat Intelligence Service: IOC management and feed synchronization
  • Extension API: Browser extension backend integration
  • Sandbox Service: Dynamic analysis of suspicious content
  • Learning Pipeline: Automated model training and deployment

Quick Start (Docker Compose) — run locally

Run the full stack (frontend + backend + ML services + databases) locally with Docker Compose:

# 1. Copy environment file and set required variables
cp .env.example .env
# Edit .env and set POSTGRES_PASSWORD to a secure value

# 2. Set up ML models (REQUIRED before first run)
./scripts/setup-ml-models.sh

# 3. Build and start all services
./scripts/start-local.sh
# Or: docker compose up --build
# Background: docker compose up --build -d

# Access:
# - Frontend:    http://localhost:3080
# - API Gateway: http://localhost:3000
# - Detection:   http://localhost:3001
# - NLP:         http://localhost:8000
# - URL:         http://localhost:8001
# - Visual:      http://localhost:8002

Check status: docker compose ps. Stop: docker compose down.

API key for smoke/integration tests: On first run, the database seeds a test API key. Use TEST_API_KEY=testkey_smoke_test_12345 when running ./scripts/smoke-test.sh or integration tests. See docs/DEPLOYMENT_RUNBOOK.md for details.

The root docker-compose.yml includes the backend stack and adds the Next.js frontend. Backend services (API gateway, detection API, threat intel, ML services) run on their standard ports. See docs/DEPLOYMENT_RUNBOOK.md for detailed configuration and environment variables.

Project Status

See docs/PROJECT_COMPLETION_STATUS.md for a full completion checklist.

Quick status: Backend services, frontend pages, browser extension, Docker setup, CI/CD, and documentation are complete. E2E smoke tests (npm run test:e2e) and sandbox disabled-state UX are implemented.

License

This project is proprietary. All rights reserved. See the LICENSE file for details.


made by harshdeep:/

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors