Email Security Analysis Pipeline

A self-hosted, containerized email security analysis system that monitors IMAP folders for suspicious messages using multi-layered threat detection.

Features

Multi-Layer Threat Detection

Layer 1: Traditional Spam Detection
- Header analysis (SPF, DKIM validation)
- Content pattern matching
- URL reputation checking
- Sender verification
Layer 2: NLP-Based Threat Detection
- Social engineering pattern recognition
- Urgency and psychological trigger detection
- Authority impersonation identification
- Linguistic manipulation analysis
Layer 3: Media Authenticity Verification
- Attachment file type validation
- Magic byte verification
- Size anomaly detection
- Deepfake indicator analysis (placeholder for ML models)

Alert & Response System

Console notifications
Webhook integration
Slack notifications
Customizable threat thresholds
Actionable threat recommendations

Observability & Monitoring

Structured Logging: JSON format for log aggregation tools (Splunk, ELK, CloudWatch)
Metrics Collection: Track emails processed, threats detected, and processing performance
Log Rotation: Automatic log file rotation to prevent disk space issues
Configurable Formats: Switch between human-readable text and machine-parseable JSON

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Email Accounts                          │
│          (Gmail, Outlook, Proton Mail via IMAP)            │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│              Email Ingestion Module                         │
│         (IMAP Client + Rate Limiting)                       │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│              Analysis Engine Stack                          │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ Layer 1: Spam Detection (Headers, URLs, Patterns)   │  │
│  ├──────────────────────────────────────────────────────┤  │
│  │ Layer 2: NLP Threat Detection (Social Engineering)  │  │
│  ├──────────────────────────────────────────────────────┤  │
│  │ Layer 3: Media Authenticity (Attachments, Deepfake) │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────┐
│           Alert & Response System                           │
│     (Console, Webhooks, Slack, Threat Reports)             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

Python 3.11+ or Docker
Email account(s) with IMAP access
App passwords for email accounts (if 2FA enabled)

Option 1: Docker Deployment (Recommended)

Clone the repository

cd ~/Documents/dev
git clone <repository-url> email-security-pipeline
cd email-security-pipeline

Configure environment

cp .env.example .env
nano .env  # Edit with your credentials

Build and run with Docker Compose
```
docker compose up -d
```
View logs
```
docker compose logs -f
```

Option 3: Background Execution (macOS/Darwin)

If you are on macOS, you can run the pipeline as a background service (launchd daemon) that starts automatically on login.

Install the daemon

chmod +x install_daemon.sh
./install_daemon.sh

Manage the service

# View logs
tail -f ~/Library/Logs/email-security-pipeline/pipeline.out

# Restart service
launchctl stop com.abhimehrotra.email-security-pipeline
launchctl start com.abhimehrotra.email-security-pipeline

# Uninstall daemon
chmod +x uninstall_daemon.sh
./uninstall_daemon.sh

Configuration

Email Account Setup

Gmail

Enable IMAP: Settings → Forwarding and POP/IMAP → Enable IMAP
Generate App Password: Google Account → Security → 2-Step Verification → App passwords

Update .env:

GMAIL_ENABLED=true
GMAIL_EMAIL=your-email@gmail.com
GMAIL_APP_PASSWORD=your-16-char-app-password

Outlook/Hotmail

⚠️ CRITICAL: Personal Outlook accounts (outlook.com, hotmail.com, live.com, msn.com) NO LONGER support app passwords as of October 1, 2024.

Personal Accounts: App passwords do NOT work. Requires OAuth2 (not currently supported).
Microsoft 365 Business: May still work with app passwords (depends on tenant configuration).

For Microsoft 365 Business accounts only:

Enable IMAP: Settings → Sync email → Let devices use IMAP
Generate App Password: Account security → Advanced security options → App passwords

Update .env:

OUTLOOK_ENABLED=true
OUTLOOK_EMAIL=your-email@outlook.com
OUTLOOK_APP_PASSWORD=your-app-password

Recommendation: Use Gmail or Proton Mail instead of personal Outlook accounts.

See OUTLOOK_TROUBLESHOOTING.md for more details.

Proton Mail

Install Proton Mail Bridge: https://proton.me/mail/bridge
Configure Bridge and get credentials

Update .env (Bridge ports as currently configured):

PROTON_ENABLED=true
PROTON_EMAIL=abhimehro@pm.me
PROTON_IMAP_SERVER=127.0.0.1
PROTON_IMAP_PORT=143      # Bridge local IMAP; use 1143 if your Bridge expects STARTTLS
PROTON_APP_PASSWORD=your-bridge-password
PROTON_FOLDERS=INBOX

Note: Proton Mail requires the Bridge application running locally; adjust the IMAP port if your Bridge uses 1143/STARTTLS instead of 143/SSL.

Connectivity sanity checks (Bridge + Gmail)

# Proton IMAP (adjust port if needed)
openssl s_client -connect 127.0.0.1:143 -quiet </dev/null | head -5

# Gmail IMAP
openssl s_client -connect imap.gmail.com:993 -quiet </dev/null | head -5

IMAP/SMTP quick check CLI

Script: scripts/check_mail_connectivity.py
Prereq: .env populated (GMAIL_* / PROTON_* vars).

Run:

python scripts/check_mail_connectivity.py

It issues NOOP on IMAP/SMTP for Gmail and Proton Bridge using your .env settings; no messages are fetched or sent.

Analysis Configuration

Adjust detection sensitivity in .env:

# Layer 1: Spam Detection
SPAM_THRESHOLD=5.0              # Lower = more sensitive
SPAM_CHECK_HEADERS=true
SPAM_CHECK_URLS=true

# Layer 2: NLP Threat Detection
NLP_THRESHOLD=0.7               # 0.0 to 1.0
CHECK_SOCIAL_ENGINEERING=true
CHECK_URGENCY_MARKERS=true
CHECK_AUTHORITY_IMPERSONATION=true

# Layer 3: Media Authenticity
CHECK_MEDIA_ATTACHMENTS=true
DEEPFAKE_DETECTION_ENABLED=true

Alert Configuration

# Console alerts
ALERT_CONSOLE=true

# Webhook alerts
ALERT_WEBHOOK_ENABLED=false
ALERT_WEBHOOK_URL=https://your-webhook-url.com/alerts

# Slack alerts
ALERT_SLACK_ENABLED=false
ALERT_SLACK_WEBHOOK=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

# Threat score thresholds
THREAT_LOW=30
THREAT_MEDIUM=60
THREAT_HIGH=80

System Configuration

LOG_LEVEL=INFO                  # DEBUG, INFO, WARNING, ERROR
CHECK_INTERVAL=300              # Seconds between email checks
MAX_EMAILS_PER_BATCH=50        # Max emails to process per cycle
RATE_LIMIT_DELAY=1              # Delay between IMAP operations (seconds)

Logging Configuration

The pipeline supports both text and JSON logging formats:

# Logging format
LOG_FORMAT=text                 # Options: "text" (colored console) or "json" (structured)
LOG_ROTATION_SIZE_MB=10        # Max size per log file before rotation
LOG_ROTATION_KEEP_FILES=5      # Number of rotated log files to keep
LOG_FILE=logs/email_security.log

Text Format (Default)

Best for local development and human reading. Includes colored output for better visibility:

2024-02-15 10:30:45 - EmailSecurityPipeline - INFO - Analyzing email: Important update...
2024-02-15 10:30:45 - EmailSecurityPipeline - INFO - Analysis complete: overall_score=25.50, risk=LOW, time=150ms

JSON Format

Best for production and log aggregation tools (Splunk, ELK, CloudWatch):

{"timestamp":"2024-02-15 10:30:45","level":"INFO","logger":"EmailSecurityPipeline","message":"Analysis complete: overall_score=25.50, risk=LOW, time=150ms","module":"main","function":"_analyze_email","line":291}

Querying JSON logs:

# Find all ERROR level logs
grep '"level":"ERROR"' logs/email_security.log | jq .

# Find logs from specific module
grep '"module":"main"' logs/email_security.log | jq .

# Find all analysis completion messages
grep 'Analysis complete' logs/email_security.log | jq .

# Count log messages by level
grep -o '"level":"[A-Z]*"' logs/email_security.log | sort | uniq -c

# View recent logs (Linux: use -d, macOS: use -v)
# Linux:
grep "$(date -u -d '1 hour ago' '+%Y-%m-%d %H')" logs/email_security.log | jq .
# macOS:
grep "$(date -u -v-1H '+%Y-%m-%d %H')" logs/email_security.log | jq .

Metrics Collection

The pipeline tracks operational metrics when ENABLE_METRICS=true:

ENABLE_METRICS=true

Metrics tracked:

Emails processed: Total count since startup
Threats detected: Breakdown by type (spam, phishing, malware) and severity
Processing time: Min, max, average, p50, p95, p99 percentiles
Errors: Categorized error counts

Viewing metrics: Metrics are logged every 10 monitoring cycles:

INFO - Metrics Summary: 150 emails processed, 12 threat types detected, avg processing time: 145ms

Use cases:

Performance monitoring: Identify slow processing
Threat patterns: Track threat types over time
Capacity planning: Understand email volume
Alerting: Set up monitoring for metric thresholds (e.g., "alert if avg processing time > 1000ms")

Project Structure

email-security-pipeline/
├── src/
│   ├── main.py                 # Main orchestrator
│   ├── modules/
│   │   ├── email_ingestion.py  # IMAP client & email parsing
│   │   ├── spam_analyzer.py    # Layer 1: Spam detection
│   │   ├── nlp_analyzer.py     # Layer 2: NLP threat detection
│   │   ├── media_analyzer.py   # Layer 3: Media authenticity
│   │   └── alert_system.py     # Alert & notification system
│   └── utils/
│       ├── config.py            # Configuration management
│       ├── structured_logging.py  # JSON logging formatter
│       ├── metrics.py           # Metrics collection
│       └── logging_utils.py     # Colored console logging
├── logs/                        # Application logs
├── data/                        # Optional database
├── .env                         # Configuration (create from .env.example)
├── .env.example                 # Configuration template
├── requirements.txt             # Python dependencies
├── Dockerfile                   # Multi-stage Docker build
├── docker-compose.yml           # Docker Compose configuration
└── README.md                    # This file

Security Considerations

Credential Management

Never commit .env file to version control
Use app-specific passwords (not account passwords)
Store credentials securely using environment variables
Consider using secret management tools (e.g., HashiCorp Vault)

Docker Security

Runs as non-root user (emailsec)
Read-only root filesystem
Resource limits enforced
Security options enabled (no-new-privileges)

Rate Limiting

Configurable delays between IMAP operations
Prevents server throttling/blocking
Respects email provider limits

Monitoring & Logs

View Logs

Docker:

docker compose logs -f email-security-pipeline

Local:

tail -f logs/email_security.log

Log Levels

DEBUG: Detailed analysis scores and decisions
INFO: Email processing and alerts
WARNING: Configuration issues, connection problems
ERROR: Critical failures

Troubleshooting

Connection Issues

Problem: Can't connect to IMAP server

Solutions:

Verify credentials in .env
Check IMAP is enabled in email account settings
For Gmail/Outlook: Ensure app password is correctly generated
For Proton Mail: Verify Bridge is running
Check firewall/network settings

Authentication Errors

Problem: Login failed

Solutions:

Regenerate app password
Verify 2FA is properly configured
Check for account security alerts
Ensure IMAP access is enabled

No Emails Detected

Problem: Pipeline runs but finds no emails

Solutions:

Check folder names in config (case-sensitive)
Verify emails exist in monitored folders
Check email is marked as "unread" (pipeline monitors UNSEEN)
Increase MAX_EMAILS_PER_BATCH if needed

Development

Setting Up Development Environment

Clone and install dependencies

git clone <repository-url> email-security-pipeline
cd email-security-pipeline
python -m pip install -r requirements-ci.txt

Install pre-commit hooks
```
pre-commit install
```
This will automatically run code quality checks before each commit, including:
- Trailing whitespace removal
- End-of-file fixing
- YAML/JSON validation
- Security scanning (bandit)
- Python syntax checks
- Large file detection

Run pre-commit manually

# Run on all files
pre-commit run --all-files

# Run on specific files
pre-commit run --files src/main.py

Bypass hooks (use sparingly)

git commit --no-verify -m "Emergency fix"

Running Tests

# Run all tests
python -m pytest

# Run with coverage (requires pytest-cov, not included in requirements-ci.txt)
python -m pip install pytest-cov
python -m pytest --cov=src --cov-report=html

# Run specific test file
python -m pytest tests/test_config.py

Code Quality Tools

The project uses the following tools (configured in .pre-commit-config.yaml):

bandit: Security linting (.bandit config)
pre-commit-hooks: Basic file quality checks
pygrep-hooks: Python-specific linting

Troubleshooting Pre-commit

Hooks failing?

Update hooks: pre-commit autoupdate
Clear cache: pre-commit clean
Reinstall: pre-commit uninstall && pre-commit install

Need to skip a specific hook?

SKIP=bandit git commit -m "Message"

Advanced Features

Adding Custom Analysis Rules

Extend spam_analyzer.py, nlp_analyzer.py, or media_analyzer.py with custom patterns:

# Example: Add custom spam keyword
SPAM_KEYWORDS = [
    # ... existing patterns
    r'\b(your-custom-pattern)\b',
]

Integrating ML Models

Uncomment transformer dependencies in requirements.txt:

transformers==4.35.0
torch==2.1.0
sentencepiece==0.1.99

Update nlp_analyzer.py to load models in _initialize_model().

Custom Alert Channels

Extend alert_system.py with additional notification methods:

def _custom_alert(self, report: ThreatReport):
    # Your custom alert logic
    pass

Performance Tuning

Configuration Parameters

Adjust these settings in .env to optimize performance for your environment:

Check Interval: Increase CHECK_INTERVAL for less frequent checks to reduce load
Rate Limit Delay: Increase RATE_LIMIT_DELAY if experiencing throttling
Batch Size: Reduce MAX_EMAILS_PER_BATCH if processing is slow
Analysis Layers: Disable unused layers for faster processing:
- Set CHECK_MEDIA_ATTACHMENTS=false to skip media analysis
- Set CHECK_SOCIAL_ENGINEERING=false to skip NLP analysis
- Set SPAM_CHECK_URLS=false to skip URL extraction

Recent Performance Optimizations

The pipeline has undergone significant performance improvements:

Email Parsing Efficiency (Feb 2026)

String Concatenation: Replaced O(N²) string concatenation with O(N) list accumulation
- Uses list.append() + "".join() instead of += for building email bodies
- Prevents DoS vulnerabilities from large multipart emails
- Reduces memory allocation overhead

Media Analysis Performance (Feb 2026)

Frequency Domain Analysis: cv2.dft is ~2x faster than numpy FFT for deepfake detection
- Switched from np.fft.fft2() to cv2.dft() for compression artifact analysis
Frame Sampling: Reduced from 20 to 10 frames for video analysis
- Statistical sampling provides equivalent detection with 50% fewer frames
- Combined impact: Video processing reduced from ~20s to ~3.5s (5.7x speedup)

NLP Pattern Matching (2025)

Regex Optimization: Combined multiple pattern searches into single-pass operations
- Pre-compiled regex patterns at module scope
- Hybrid approach: fast detection pass + detailed identification pass
- Memory-efficient match counting with generators instead of findall()

Input Truncation (2025)

LRU Cache Efficiency: Truncate large text to processing limits before caching
- Transformer models typically process first 512 tokens only
- Truncating before caching achieves ~300x speedup on repeated large inputs

Performance Monitoring

To measure performance in your environment:

# Email parsing benchmark (requires test emails)
python -m pytest tests/test_ingestion_optimization.py -v

# Media analysis benchmark (requires test videos)
python tests/benchmark_media.py  # If available

# Full analysis with timing
LOG_LEVEL=DEBUG python src/main.py

Optimization Guidelines

When optimizing for your use case:

Profile First: Use LOG_LEVEL=DEBUG to identify bottlenecks
Disable Unused Features: Turn off unnecessary analysis layers
Adjust Limits: Reduce MAX_BODY_SIZE and attachment limits for faster processing
Batch Processing: Balance MAX_EMAILS_PER_BATCH vs processing time
Resource Allocation: Ensure adequate CPU/memory for media analysis if enabled

Limitations

Personal Outlook Accounts: Not supported due to Microsoft's discontinuation of app password authentication (October 2024). Only OAuth2 is supported, which is not yet implemented.
Proton Mail: Requires Bridge application running locally (Bridge provides localhost IMAP server)
Deepfake Detection: Basic heuristics only; ML models not included
Transformer Models: Not enabled by default (requires additional dependencies)
IMAP Only: Does not support POP3 or proprietary APIs (e.g., Microsoft Graph)

Future Enhancements

OAuth2 Authentication for personal Outlook/Microsoft accounts (high priority)
Full transformer model integration for NLP analysis
Advanced deepfake detection using specialized ML models
Database persistence for threat history
Web dashboard for monitoring and configuration
Email quarantine and automatic remediation
Integration with SIEM systems
Support for Microsoft Graph API (Exchange/O365)
Sandboxed attachment analysis

GitHub Actions & Agentic Workflows

This repository includes automated workflows powered by GitHub Actions and AI agents:

Daily Perf Improver - Identifies and implements performance optimizations
Daily QA - Performs quality assurance checks and suggests improvements
Daily Backlog Burner - Helps work through backlog items systematically

Prerequisites for Agentic Workflows

Some workflows require GitHub Discussions to be enabled for coordination and planning.

⚠️ If you see workflow failures about "Failed to create discussion", you need to enable Discussions:

Go to your repository Settings
Scroll to Features section
Enable Discussions

For detailed setup instructions, see Agentic Workflows Setup Guide.

License

MIT License - See LICENSE file for details

Support

For issues, questions, or contributions, please open an issue on GitHub.

Acknowledgments

Built with security and privacy in mind for self-hosted email threat detection.

Name		Name	Last commit message	Last commit date
Latest commit History 652 Commits
.Jules		.Jules
.cursor		.cursor
.github		.github
.jules		.jules
.trunk		.trunk
launchd		launchd
scripts		scripts
src		src
tests		tests
.bandit		.bandit
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvimlog		.nvimlog
.pre-commit-config.yaml		.pre-commit-config.yaml
ANALYSIS_REPORT.md		ANALYSIS_REPORT.md
BACKLOG_PHASE_2A_SUMMARY.md		BACKLOG_PHASE_2A_SUMMARY.md
COMPREHENSIVE_REVIEW_2025-11-08.md		COMPREHENSIVE_REVIEW_2025-11-08.md
CONSOLIDATION_NEXT_STEPS.md		CONSOLIDATION_NEXT_STEPS.md
CONSOLIDATION_QUICK_REFERENCE.md		CONSOLIDATION_QUICK_REFERENCE.md
CONSOLIDATION_REPORT.md		CONSOLIDATION_REPORT.md
CONSOLIDATION_SUMMARY.md		CONSOLIDATION_SUMMARY.md
DOCUMENTATION_UPDATE_SUMMARY.md		DOCUMENTATION_UPDATE_SUMMARY.md
Dockerfile		Dockerfile
ENV_SETUP.md		ENV_SETUP.md
FIX_COMPLETE.md		FIX_COMPLETE.md
FUTURE_ENHANCEMENTS.md		FUTURE_ENHANCEMENTS.md
LICENSE		LICENSE
OTHER_WORKFLOWS_DISCUSSION_ISSUE.md		OTHER_WORKFLOWS_DISCUSSION_ISSUE.md
OUTLOOK_TROUBLESHOOTING.md		OUTLOOK_TROUBLESHOOTING.md
PR Triage and Integration Plan — email-security-pipeline.md		PR Triage and Integration Plan — email-security-pipeline.md
QUICKSTART.md		QUICKSTART.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
SECURITY.md		SECURITY.md
TEST_RESULTS.md		TEST_RESULTS.md
VERIFICATION_CHECKLIST.md		VERIFICATION_CHECKLIST.md
WARP.md		WARP.md
WORKFLOW_FIX_SUMMARY.md		WORKFLOW_FIX_SUMMARY.md
diff_bolt_regex.txt		diff_bolt_regex.txt
diff_bolt_url.txt		diff_bolt_url.txt
diff_dos.txt		diff_dos.txt
diff_header.txt		diff_header.txt
diff_log.txt		diff_log.txt
diff_palette.txt		diff_palette.txt
docker-compose.yml		docker-compose.yml
install_daemon.sh		install_daemon.sh
pytest.ini		pytest.ini
requirements-ci.txt		requirements-ci.txt
requirements.txt		requirements.txt
setup.sh		setup.sh
test_config.py		test_config.py
uninstall_daemon.sh		uninstall_daemon.sh

License

abhimehro/email-security-pipeline

Folders and files

Latest commit

History

Repository files navigation

Email Security Analysis Pipeline

Features

Multi-Layer Threat Detection

Alert & Response System

Observability & Monitoring

Architecture

Quick Start

Prerequisites

Option 1: Docker Deployment (Recommended)

Option 3: Background Execution (macOS/Darwin)

Configuration

Email Account Setup

Gmail

Outlook/Hotmail

Proton Mail

Connectivity sanity checks (Bridge + Gmail)

IMAP/SMTP quick check CLI

Analysis Configuration

Alert Configuration

System Configuration

Logging Configuration

Text Format (Default)

JSON Format

Metrics Collection

Project Structure

Security Considerations

Credential Management

Docker Security

Rate Limiting

Monitoring & Logs

View Logs

Log Levels

Troubleshooting

Connection Issues

Authentication Errors

No Emails Detected

Development

Setting Up Development Environment

Running Tests

Code Quality Tools

Troubleshooting Pre-commit

Advanced Features

Adding Custom Analysis Rules

Integrating ML Models

Custom Alert Channels

Performance Tuning

Configuration Parameters

Recent Performance Optimizations

Email Parsing Efficiency (Feb 2026)

Media Analysis Performance (Feb 2026)

NLP Pattern Matching (2025)

Input Truncation (2025)

Performance Monitoring

Optimization Guidelines

Limitations

Future Enhancements

GitHub Actions & Agentic Workflows

Prerequisites for Agentic Workflows

License

Support

Acknowledgments

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 10

Packages