A comprehensive, enterprise-grade phishing detection and prevention platform leveraging advanced machine learning models, real-time analysis, and threat intelligence integration to protect organizations from sophisticated phishing attacks.
The Real-Time AI/ML-Based Phishing Detection and Prevention System is a microservices-based platform designed to detect and prevent phishing attacks through multi-layered analysis. The system combines natural language processing, graph neural networks, computer vision, and threat intelligence to provide comprehensive protection against evolving phishing threats.
The platform analyzes emails, URLs, and web content using an ensemble of machine learning models to identify phishing attempts with high accuracy. It provides real-time threat detection with sub-50ms latency, integrates with threat intelligence feeds, and offers continuous learning capabilities to adapt to new attack patterns.
Key Use Cases:
- Enterprise email security and filtering
- Browser-based threat protection
- API-based threat analysis for security tools
- Real-time threat monitoring and alerting
- Security research and threat intelligence
Text Analysis (NLP)
- Phishing detection using fine-tuned BERT/RoBERTa models
- AI-generated content detection
- Urgency and sentiment analysis
- Social engineering indicator identification
- Email parsing and header analysis
- Multi-language support
URL and Domain Analysis
- Graph Neural Network-based domain relationship analysis
- Redirect chain tracking and analysis
- Homoglyph and typosquatting detection
- WHOIS and DNS record analysis
- SSL certificate validation
- Domain reputation scoring
Visual Analysis
- CNN-based brand impersonation detection
- DOM structure analysis
- Visual similarity matching
- Logo and form field detection
- Screenshot-based comparison
- Sub-50ms Detection Latency: Optimized for real-time threat detection
- WebSocket Support: Real-time event streaming to clients
- Parallel Processing: Simultaneous analysis across multiple ML services
- Caching Strategy: Redis-based caching for frequently accessed data
- Async Processing: Background jobs for heavy analysis tasks
- IOC Management: Comprehensive Indicators of Compromise database
- Feed Integration: MISP and AlienVault OTX synchronization
- Bloom Filter Lookups: Fast IOC matching with probabilistic data structures
- Custom Feeds: Support for custom threat intelligence sources
- IOC Enrichment: Automatic enrichment with metadata and context
- Automated Training Pipeline: Scheduled model retraining from new data
- Drift Detection: Automatic detection of model performance degradation
- A/B Testing: Model version comparison and gradual rollout
- Feedback Loop: User feedback integration for model improvement
- Feature Store: Centralized feature management for training
- Real-Time URL Checking: Instant threat detection while browsing
- Email Scanning: Integration with email clients for message analysis
- Privacy-Preserving: Local caching and minimal data transmission
- Blocking Mechanisms: Automatic blocking of confirmed threats
- User Reporting: Easy reporting of suspicious content
- Dynamic Analysis: Behavioral analysis of links and attachments
- Multi-Engine Support: Integration with Cuckoo, Any.run, and custom sandboxes
- Result Correlation: Integration with detection signals
- Automated Submission: Queue-based processing for sandbox jobs
- Comprehensive Reporting: Detailed behavioral analysis reports
- Framework: Next.js 16.0.10 (App Router)
- UI Library: React 19.2.0
- Language: TypeScript 5.x
- Styling: Tailwind CSS 4.1.9
- Components: Radix UI
- Charts: Recharts 2.15.4
- State Management: React Hooks
- Theming: next-themes (dark/light mode support)
API Gateway & Core Services:
- Runtime: Node.js 20+
- Framework: Express.js 4.18.2
- Language: TypeScript 5.3.3
- Authentication: API key-based with bcrypt
- Rate Limiting: Custom middleware with Redis
- WebSocket: Socket.io 4.5.4
ML Services:
- Runtime: Python 3.11+
- Framework: FastAPI 0.104.1
- Server: Uvicorn 0.24.0
- Validation: Pydantic 2.5.0
- NLP: Transformers 4.35.0, Sentence Transformers 2.2.2
- Deep Learning: PyTorch 2.1.0
- Graph Networks: PyTorch Geometric 2.4.0
- Computer Vision: Torchvision 0.16.0, OpenCV 4.8.1
- ML Utilities: scikit-learn 1.3.2, NumPy 1.24.3
- NLP Tools: NLTK 3.8.1, spaCy 3.7.2
- PostgreSQL 15: Primary relational database (20 tables)
- MongoDB 7: Document store for ML analysis results
- Redis 7: Caching, queues (BullMQ), rate limiting
- Cloud Provider: AWS (ap-south-1)
- Infrastructure as Code: Terraform 1.5.0
- Containerization: Docker, Docker Compose
- Orchestration: AWS ECS Fargate
- Load Balancing: Application Load Balancer
- Service Discovery: AWS Cloud Map
- Storage: AWS S3 (models, training data, logs)
- CI/CD: GitHub Actions
- CI/CD: GitHub Actions workflows
- Logging: CloudWatch Logs, Winston
- Monitoring: CloudWatch Metrics
- Security Scanning: Trivy
- Version Control: Git
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (Next.js 16) │
│ Threat Dashboard | Real-time Monitor | Threat Intelligence │
└────────────────────────────┬────────────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────────┐
│ API Gateway (Node.js/TypeScript) │
│ Routing | Authentication | Rate Limiting | Request Logging │
└───────┬───────────────┬───────────────┬──────────────────────────┘
│ │ │
┌───────▼──────┐ ┌─────▼──────┐ ┌───▼──────────────┐
│ Detection API│ │Threat Intel│ │ Extension API │
│ (Orchestrator)│ │ Service │ │ Service │
└───────┬──────┘ └────────────┘ └──────────────────┘
│
┌───┴───┬──────────┬──────────┬──────────┐
│ │ │ │ │
┌───▼───┐ ┌─▼───┐ ┌───▼───┐ ┌────▼────┐ ┌────▼────┐
│ NLP │ │URL │ │Visual │ │Sandbox │ │Learning │
│Service│ │Svc │ │Svc │ │Service │ │Pipeline │
└───┬───┘ └─┬───┘ └───┬───┘ └────┬────┘ └────┬────┘
│ │ │ │ │
└───────┴──────────┴──────────┴──────────┘
│ │ │
┌───────▼──────────▼──────────▼──────────┐
│ PostgreSQL | MongoDB | Redis │
└────────────────────────────────────────┘
ML Services (Python/FastAPI):
- NLP Service: Text analysis using transformer models
- URL Service: Domain and URL analysis using Graph Neural Networks
- Visual Service: Visual pattern recognition using CNNs
Core Services (Node.js/TypeScript):
- API Gateway: Entry point with authentication and routing
- Detection API: Orchestrates ML services for threat detection
- Threat Intelligence Service: IOC management and feed synchronization
- Extension API: Browser extension backend integration
- Sandbox Service: Dynamic analysis of suspicious content
- Learning Pipeline: Automated model training and deployment
Run the full stack (frontend + backend + ML services + databases) locally with Docker Compose:
# 1. Copy environment file and set required variables
cp .env.example .env
# Edit .env and set POSTGRES_PASSWORD to a secure value
# 2. Set up ML models (REQUIRED before first run)
./scripts/setup-ml-models.sh
# 3. Build and start all services
./scripts/start-local.sh
# Or: docker compose up --build
# Background: docker compose up --build -d
# Access:
# - Frontend: http://localhost:3080
# - API Gateway: http://localhost:3000
# - Detection: http://localhost:3001
# - NLP: http://localhost:8000
# - URL: http://localhost:8001
# - Visual: http://localhost:8002Check status: docker compose ps. Stop: docker compose down.
API key for smoke/integration tests: On first run, the database seeds a test API key. Use TEST_API_KEY=testkey_smoke_test_12345 when running ./scripts/smoke-test.sh or integration tests. See docs/DEPLOYMENT_RUNBOOK.md for details.
The root docker-compose.yml includes the backend stack and adds the Next.js frontend. Backend services (API gateway, detection API, threat intel, ML services) run on their standard ports. See docs/DEPLOYMENT_RUNBOOK.md for detailed configuration and environment variables.
See docs/PROJECT_COMPLETION_STATUS.md for a full completion checklist.
Quick status: Backend services, frontend pages, browser extension, Docker setup, CI/CD, and documentation are complete. E2E smoke tests (npm run test:e2e) and sandbox disabled-state UX are implemented.
This project is proprietary. All rights reserved. See the LICENSE file for details.
made by harshdeep:/