HED-BOT Security Best Practices

This document outlines the security measures implemented in HED-BOT to ensure compliance with security audits and best practices.

Overview
Authentication & Authorization
CORS & Origin Validation
Audit Logging
Security Headers
Rate Limiting
HTTPS & Encryption
Environment Variables
Audit Compliance

Overview

HED-BOT implements defense-in-depth security with multiple layers:

API Key Authentication - Prevents unauthorized access
CORS Validation - Allows only approved origins
Audit Logging - Complete request/response trail
Security Headers - Protects against common attacks
Rate Limiting - Prevents abuse and DoS
HTTPS Only - Encrypted communication
Input Validation - Prevents injection attacks

Authentication & Authorization

API Key Authentication

Implementation: FastAPI dependency injection with custom middleware

Location: src/api/security.py

How It Works

Client includes API key in X-API-Key header
FastAPI validates key before processing request
Invalid/missing keys return 401 Unauthorized
All authentication events are audit logged

Generating API Keys

# Generate a secure random API key
python scripts/generate_api_key.py

# Output: API Key: a1b2c3d4e5f6...  (64 characters)

Configuring API Keys

Option 1: Environment Variable (Recommended)

# .env file
API_KEYS=key1_64_chars,key2_64_chars,key3_64_chars

Option 2: Individual Keys

# .env file
API_KEY_1=first_key_64_chars
API_KEY_2=second_key_64_chars
API_KEY_3=third_key_64_chars

Disabling Authentication (Development Only)

# .env file
REQUIRE_API_AUTH=false  # NOT recommended for production

⚠️ Warning: Never disable authentication in production!

Using API Keys

Frontend (JavaScript):

fetch('https://hedtools.ucsd.edu/hed-bot-api/annotate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'your_api_key_here',
  },
  body: JSON.stringify({ description: '...' }),
});

cURL:

curl -X POST https://hedtools.ucsd.edu/hed-bot-api/annotate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key_here" \
  -d '{"description": "person sees red circle"}'

Python:

import requests

response = requests.post(
    'https://hedtools.ucsd.edu/hed-bot-api/annotate',
    headers={'X-API-Key': 'your_api_key_here'},
    json={'description': 'person sees red circle'}
)

Protected vs Public Endpoints

Protected (Require API Key):

POST /annotate - Generate annotations
POST /annotate-from-image - Image annotations
POST /annotate/stream - Streaming annotations
POST /validate - Validate HED strings

Public (No API Key):

GET /health - Health checks (for monitoring)
GET /version - Version information
GET / - API documentation

CORS & Origin Validation

Allowed Origins

Production: Only https://hed-bot.pages.dev

Development: http://localhost:5173, http://localhost:3000

Configuration

# src/api/main.py
allowed_origins = [
    "https://hed-bot.pages.dev",  # Production
    "http://localhost:5173",       # Dev
]

Adding Extra Origins

# .env file
EXTRA_CORS_ORIGINS=https://staging.hed-bot.pages.dev,https://dev.hed-bot.pages.dev

Two-Layer CORS Protection

Nginx Layer: Validates Origin header
FastAPI Layer: Enforces CORS policy

Even if Nginx is bypassed, FastAPI will reject invalid origins.

Audit Logging

What Gets Logged

Every Request:

Timestamp (ISO 8601 format)
Client IP address
HTTP method and path
API key hash (first 8 characters)
User identifier (if available)

Every Response:

HTTP status code
Processing time (milliseconds)
Response size (if applicable)

Errors:

Error type and message
Stack trace (in debug mode)
Associated request details

Log Format

2025-12-02T15:30:45.123Z - hed_bot.audit - INFO - [AUDIT] REQUEST - timestamp=2025-12-02T15:30:45.123Z, ip=1.2.3.4, method=POST, path=/annotate, api_key=a1b2c3d4..., user=anonymous
2025-12-02T15:30:47.456Z - hed_bot.audit - INFO - [AUDIT] RESPONSE - timestamp=2025-12-02T15:30:47.456Z, ip=1.2.3.4, method=POST, path=/annotate, status=200, duration_ms=2333.45

Log Locations

Audit Log: /var/log/hed-bot/audit.log Application Log: Docker container logs (via docker logs) Nginx Access Log: /var/log/nginx/access.log Nginx Error Log: /var/log/nginx/error.log

Configuration

# .env file
ENABLE_AUDIT_LOG=true  # Enable/disable audit logging
AUDIT_LOG_FILE=/var/log/hed-bot/audit.log  # Log file location

Log Retention

Recommended retention policies:

Audit logs: 90 days minimum (compliance requirement)
Application logs: 30 days
Access logs: 30 days

Logrotate Configuration:

# /etc/logrotate.d/hed-bot
/var/log/hed-bot/*.log {
    daily
    rotate 90
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    postrotate
        docker kill -s USR1 hed-bot
    endscript
}

Security Headers

All responses include security headers to protect against common attacks:

Headers Implemented

X-Content-Type-Options: nosniff          # Prevent MIME type sniffing
X-Frame-Options: DENY                     # Prevent clickjacking
X-XSS-Protection: 1; mode=block           # Enable XSS filter
Strict-Transport-Security: max-age=31536000; includeSubDomains  # Force HTTPS

Content Security Policy (CSP)

For API responses (added in Nginx):

add_header Content-Security-Policy "default-src 'self'" always;

Rate Limiting

Nginx-Based Rate Limiting

Configuration:

# In http block
limit_req_zone $binary_remote_addr zone=hed_bot_limit:10m rate=60r/m;

# In location block
limit_req zone=hed_bot_limit burst=10 nodelay;
limit_req_status 429;

Limits:

60 requests per minute per IP address
10 request burst allowed
429 status code when exceeded

Per-Endpoint Limits (Future)

Can be implemented with FastAPI slowapi:

from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@app.post("/annotate")
@limiter.limit("10/minute")
async def annotate(...):
    ...

HTTPS & Encryption

Requirements

HTTPS Only: All production traffic must use HTTPS
TLS 1.2+: Minimum TLS version 1.2 (TLS 1.3 recommended)
Strong Ciphers: Use modern cipher suites

Nginx TLS Configuration

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256...';
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;

Certificate Management

Use Let's Encrypt for free TLS certificates
Auto-renewal via certbot
HSTS header enforces HTTPS

Environment Variables

Secret Management

Never commit secrets to Git!

✅ Correct:

# .env file (gitignored)
OPENROUTER_API_KEY=sk-or-v1-abc123...
API_KEYS=key1,key2,key3

❌ Incorrect:

# NEVER hardcode secrets in code
api_key = "sk-or-v1-abc123..."  # BAD!

Environment File Template

# .env.example (committed to Git, no real values)
OPENROUTER_API_KEY=your_openrouter_key_here
API_KEYS=your_api_key_1,your_api_key_2
REQUIRE_API_AUTH=true
ENABLE_AUDIT_LOG=true
AUDIT_LOG_FILE=/var/log/hed-bot/audit.log
EXTRA_CORS_ORIGINS=

File Permissions

# Restrict .env file permissions
chmod 600 .env
chown hed-bot:hed-bot .env

# Verify
ls -la .env
# Output: -rw------- 1 hed-bot hed-bot 256 Dec 02 15:30 .env

Audit Compliance

For Security Auditors

This section provides information for security auditors reviewing HED-BOT.

Security Controls Implemented

Control	Implementation	Evidence
Authentication	API Key (64-char random)	`src/api/security.py`
Authorization	Endpoint-level auth required	`src/api/main.py` (Depends)
Audit Logging	All requests/responses logged	`src/api/security.py`, logs
CORS	Whitelist-based origin validation	`src/api/main.py`
Encryption	HTTPS/TLS 1.2+ required	Nginx config
Input Validation	Pydantic models	`src/api/models.py`
Rate Limiting	60 req/min per IP	Nginx config
Security Headers	HSTS, X-Frame, CSP, etc.	Middleware
Secret Management	Environment variables only	`.env` (not in Git)

Compliance Standards

OWASP Top 10 (2021): Addressed

A01:2021-Broken Access Control ✅
- API key authentication required
- Audit logging tracks all access
A02:2021-Cryptographic Failures ✅
- HTTPS only (enforced via HSTS)
- Secrets in environment variables
A03:2021-Injection ✅
- Input validation via Pydantic
- Parameterized queries (no SQL injection)
A04:2021-Insecure Design ✅
- Defense in depth (multiple layers)
- Principle of least privilege
A05:2021-Security Misconfiguration ✅
- Security headers configured
- Debug mode disabled in production
A06:2021-Vulnerable Components ✅
- Regular dependency updates
- Automated security scanning (future)
A07:2021-Authentication Failures ✅
- Strong API keys (64 characters)
- Failed auth attempts logged
A08:2021-Software & Data Integrity ✅
- Audit logs for all changes
- Version control (Git)
A09:2021-Logging & Monitoring ✅
- Comprehensive audit logs
- Health checks and monitoring
A10:2021-SSRF ✅
- No user-controlled URLs
- LLM calls are to whitelisted APIs only

Audit Trail Access

Viewing Audit Logs:

# View recent audit logs
sudo tail -f /var/log/hed-bot/audit.log

# Search for specific API key usage
grep "api_key=a1b2c3d4" /var/log/hed-bot/audit.log

# View all requests from an IP
grep "ip=1.2.3.4" /var/log/hed-bot/audit.log

# View errors
grep "ERROR" /var/log/hed-bot/audit.log

Testing Security Controls

Authentication Test:

# Should fail (no API key)
curl https://hedtools.ucsd.edu/hed-bot-api/annotate

# Should succeed (valid API key)
curl -H "X-API-Key: valid_key" https://hedtools.ucsd.edu/hed-bot-api/annotate

CORS Test:

# Should include CORS headers
curl -H "Origin: https://hed-bot.pages.dev" \
     -I https://hedtools.ucsd.edu/hed-bot-api/health

# Should reject invalid origin
curl -H "Origin: https://evil.com" \
     -I https://hedtools.ucsd.edu/hed-bot-api/health

Rate Limiting Test:

# Rapid requests should trigger 429
for i in {1..70}; do
  curl https://hedtools.ucsd.edu/hed-bot-api/health
done

Security Checklist

Use this checklist for deployment and audits:

Pre-Deployment

API keys generated and stored securely
.env file has correct permissions (600)
HTTPS certificate installed and valid
Nginx configured with security headers
Rate limiting enabled
Audit logging enabled and tested
CORS origins configured correctly
Debug mode disabled (DEBUG=false)

Post-Deployment

Health check accessible: https://hedtools.ucsd.edu/hed-bot-api/health
Authentication working (401 without API key)
CORS headers present in responses
Audit logs being written to /var/log/hed-bot/audit.log
Security headers present in responses
Rate limiting triggering after 60 req/min
HTTPS redirect working (HTTP → HTTPS)

Ongoing Maintenance

Review audit logs weekly
Rotate API keys quarterly
Update dependencies monthly
Review access logs for suspicious activity
Test backup/restore procedures
Update TLS certificates before expiry

Incident Response

If API Key is Compromised

Immediately remove compromised key from .env
Restart Docker container: docker restart hed-bot
Generate new API key: python scripts/generate_api_key.py
Update frontend with new key
Review audit logs for unauthorized access
Document incident for audit trail

If Breach is Suspected

Check audit logs for suspicious activity
Review nginx access logs
Check for unauthorized API keys
Verify CORS origins haven't been modified
Review recent code changes
Contact security team

Contact & Support

Security Issues: Report to security team immediately Audit Questions: Contact project lead Implementation Questions: See deploy/README.md

Last Updated: December 2, 2025

FilesExpand file tree

SECURITY.md

Latest commit

History

SECURITY.md

File metadata and controls

HED-BOT Security Best Practices

Table of Contents

Overview

Authentication & Authorization

API Key Authentication

How It Works

Generating API Keys

Configuring API Keys

Disabling Authentication (Development Only)

Using API Keys

Protected vs Public Endpoints

CORS & Origin Validation

Allowed Origins

Configuration

Adding Extra Origins

Two-Layer CORS Protection

Audit Logging

What Gets Logged

Log Format

Log Locations

Configuration

Log Retention

Security Headers

Headers Implemented

Content Security Policy (CSP)

Rate Limiting

Nginx-Based Rate Limiting

Per-Endpoint Limits (Future)

HTTPS & Encryption

Requirements

Nginx TLS Configuration

Certificate Management

Environment Variables

Secret Management

Environment File Template

File Permissions

Audit Compliance

For Security Auditors

Security Controls Implemented

Compliance Standards

Audit Trail Access

Testing Security Controls

Security Checklist

Pre-Deployment

Post-Deployment

Ongoing Maintenance

Incident Response

If API Key is Compromised

If Breach is Suspected

Contact & Support