Skip to content

Latest commit

 

History

History
165 lines (122 loc) · 4.12 KB

File metadata and controls

165 lines (122 loc) · 4.12 KB

M4 Storage & MongoDB Setup Guide

Quick Start

1. Add MongoDB URI to Encrypted Secrets

Run the encryption script to add your MongoDB credentials:

python encrypt_secrets.py

When prompted, enter these secrets:

GEMINI_API_KEY=<your-gemini-key>
MONGO_URI=mongodb+srv://anishdas16_db_user:alc5Z6Lklzgtdt8g@cluster0.fzdc8nh.mongodb.net/
SIGNING_SECRET=<generate-a-strong-secret-key>

Note: The SIGNING_SECRET should be a strong random string. You can generate one with:

import secrets
print(secrets.token_hex(32))  # Generates a 64-character hex string

2. Load Secrets in the App

  1. Run the Streamlit app:

    streamlit run app.py
  2. Navigate to the 🔐 Security tab

  3. Enter your master passphrase (the one you used in encrypt_secrets.py)

  4. Click 🔓 Load Secrets

  5. You should see: ✅ Secrets Loaded

3. Start Using MongoDB History

  1. Go to the 📧 Email Analysis tab

  2. Fetch and scan some emails

  3. Analyses will automatically be saved to MongoDB

  4. Switch to the 📚 History tab to view stored analyses

MongoDB Atlas Setup (Already Done! ✅)

Your MongoDB cluster is ready:

  • Cluster: cluster0.fzdc8nh.mongodb.net
  • Database: phishguard (auto-created)
  • Collection: email_analyses (auto-created)

Features Available

Automatic Storage

  • Every analyzed email is automatically saved to MongoDB
  • Only emails from the current app session are stored
  • Duplicates are prevented (unique gmail_id)

History Tab Features

  • View all stored analyses
  • Filter by risk level (HIGH_RISK, REVIEW, SAFE)
  • See signature verification status
  • Expandable cards with full details
  • Tamper detection warnings

Data Integrity

  • All records digitally signed (HMAC-SHA256)
  • Signatures verified on every load
  • Invalid signatures clearly flagged
  • Prevents data tampering

Database Indexes

The following indexes are automatically created:

  1. gmail_id (unique) - Prevents duplicate entries
  2. processed_at (descending) - Fast time-based queries
  3. risk_label (ascending) - Efficient filtering

Security Best Practices

✅ MongoDB URI encrypted in secrets.enc ✅ TLS/SSL connection enforced ✅ Digital signatures on all documents ✅ No plaintext credentials in code ✅ Session-based validation

Troubleshooting

"Secrets Not Loaded"

  • Go to Security tab
  • Enter your passphrase
  • Click "Load Secrets"

"MONGO_URI not found"

  • Run python encrypt_secrets.py again
  • Make sure to include MONGO_URI field
  • Reload secrets in the app

Connection Errors

  • Check your internet connection
  • Verify MongoDB Atlas cluster is running
  • Check IP allowlist in MongoDB Atlas (should allow all: 0.0.0.0/0 for testing)

No History Showing

  • Make sure you've scanned some emails first
  • Check that secrets are loaded
  • Look for error messages in History tab

Storage Schema

Each analysis document contains:

{
  "gmail_id": "string",
  "sender": "email@example.com",
  "subject": "Email subject",
  "date": "Date string",
  "risk_score": 85.5,
  "risk_label": "HIGH_RISK",
  "ai_risk_score": 90,
  "heuristic_risk_score": 70,
  "intents": ["phishing", "credential_theft"],
  "indicators": ["urgent_language", "suspicious_link"],
  "ai_summary": "AI analysis summary",
  "recommendations": ["Do not click links", "Report as phishing"],
  "processed_at": "2025-11-02T10:30:00",
  "mock_mode": false,
  "signature": "base64_signature_string",
  "signature_valid": true
}

Optional: TTL Index for Auto-Cleanup

To automatically delete analyses older than 7 days, uncomment the TTL index in storage.py:

# In _get_collection() function, uncomment:
collection.create_index(
    [("processed_at", ASCENDING)],
    expireAfterSeconds=7 * 24 * 60 * 60,  # 7 days
    name="processed_at_ttl"
)

Support

For MongoDB Atlas issues:

For PhishGuard issues:

  • Check logs in terminal
  • Review error messages in History tab
  • Verify all dependencies installed: pip list | grep pymongo