-
Notifications
You must be signed in to change notification settings - Fork 0
Performance and Scaling
This guide covers performance tuning, scaling strategies, capacity planning, and optimization techniques for TMI deployments.
TMI performance optimization and scaling involves:
- Application performance tuning
- Database optimization and scaling
- Cache performance optimization
- Horizontal and vertical scaling strategies
- Load balancing and high availability
- Capacity planning and monitoring
# Check response times
curl -w "@curl-format.txt" -o /dev/null -s https://api.tmi.example.com/version
# curl-format.txt:
time_total: %{time_total}\n
time_connect: %{time_connect}\n
time_starttransfer:%{time_starttransfer}\n
size_download: %{size_download}\n
# Load test with Apache Bench
ab -n 1000 -c 10 https://api.tmi.example.com/version
# WebSocket connection test
wscat -c "wss://api.tmi.example.com/ws/diagrams/{id}" \
-H "Authorization: Bearer $TOKEN"-- Check slow queries
SELECT
query,
mean_time,
calls,
total_time
FROM pg_stat_statements
WHERE mean_time > 100
ORDER BY mean_time DESC
LIMIT 10;
-- Check database size and growth
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database;
-- Check connection count
SELECT count(*) FROM pg_stat_activity;# Redis hit rate
redis-cli -h redis-host -a password info stats | \
awk '/keyspace_hits|keyspace_misses/ {
split($0,a,":");
if ($1 ~ /hits/) hits=a[2];
if ($1 ~ /misses/) misses=a[2]
}
END {
total=hits+misses;
rate=(hits/total)*100;
printf "Hit Rate: %.2f%%\n", rate
}'
# Memory usage
redis-cli -h redis-host -a password info memory | grep used_memory_humanOptimize HTTP timeouts for your workload:
# config-production.yml
server:
read_timeout: 5s # Time to read request
write_timeout: 10s # Time to write response
idle_timeout: 60s # Idle connection timeoutFor high-latency clients or large payloads:
server:
read_timeout: 15s
write_timeout: 30s
idle_timeout: 120sVia environment:
SERVER_READ_TIMEOUT=15s
SERVER_WRITE_TIMEOUT=30s
SERVER_IDLE_TIMEOUT=120s# WebSocket inactivity timeout
WEBSOCKET_INACTIVITY_TIMEOUT_SECONDS=300 # 5 minutes
# For high-activity collaboration
WEBSOCKET_INACTIVITY_TIMEOUT_SECONDS=600 # 10 minutes# Set maximum Go processes (default: number of CPU cores)
GOMAXPROCS=8
# Garbage collection tuning
GOGC=100 # Default - adjust based on memory patterns
# For memory-constrained environments
GOGC=80 # More frequent GC, lower memory usage
# For CPU-constrained environments
GOGC=200 # Less frequent GC, higher memory usageFor systemd service:
# /etc/systemd/system/tmi.service
[Service]
# Maximum processes
LimitNPROC=512
# Maximum open files
LimitNOFILE=65536
# Memory limit
MemoryLimit=1G
# CPU limit (100% of one core)
CPUQuota=100%For Docker:
docker run -d \
--name tmi-server \
--memory="1g" \
--cpus="2.0" \
--ulimit nofile=65536:65536 \
tmi/tmi-server:latestFor Kubernetes:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "2000m"Optimize logging for performance:
logging:
level: "info" # Use 'warn' or 'error' for production
log_api_requests: false # Disable in high-traffic production
log_api_responses: false # Disable to reduce I/O
log_websocket_messages: false # Disable for performance
redact_auth_tokens: true # Security
suppress_unauth_logs: true # Reduce noiseFor high-performance production:
LOGGING_LEVEL=warn
LOGGING_LOG_API_REQUESTS=false
LOGGING_LOG_API_RESPONSES=false
LOGGING_LOG_WEBSOCKET_MESSAGES=falseConfigure connection pooling:
database:
postgres:
max_open_conns: 25 # Max concurrent connections
max_idle_conns: 5 # Idle connections to maintain
conn_max_lifetime: 5m # Connection lifetimeSizing guidelines:
-
Small deployment (< 100 users):
- max_open_conns: 10
- max_idle_conns: 2
-
Medium deployment (100-1000 users):
- max_open_conns: 25
- max_idle_conns: 5
-
Large deployment (1000+ users):
- max_open_conns: 50
- max_idle_conns: 10
Edit /etc/postgresql/*/main/postgresql.conf:
# Memory Settings
shared_buffers = 256MB # 25% of RAM (for dedicated server)
effective_cache_size = 1GB # 50-75% of RAM
work_mem = 16MB # Per-operation memory
maintenance_work_mem = 64MB # For VACUUM, CREATE INDEX
# Connection Settings
max_connections = 100 # Adjust based on connection pool
# Query Planner
random_page_cost = 1.1 # For SSD (default 4.0 for HDD)
effective_io_concurrency = 200 # For SSD (default 1)
# Write Performance
wal_buffers = 16MB
checkpoint_completion_target = 0.9For production with 4GB RAM:
shared_buffers = 1GB
effective_cache_size = 3GB
work_mem = 32MB
maintenance_work_mem = 256MBRestart PostgreSQL after changes:
sudo systemctl restart postgresqlCheck for missing indexes:
-- Tables with high sequential scan counts
SELECT
schemaname,
tablename,
seq_scan,
idx_scan,
seq_tup_read,
CASE
WHEN seq_scan > 0 THEN seq_tup_read / seq_scan
ELSE 0
END AS avg_seq_tup_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
AND schemaname = 'public'
ORDER BY seq_tup_read DESC
LIMIT 20;TMI's key indexes (already created by migrations):
-- Primary key indexes (automatic)
-- Foreign key indexes
CREATE INDEX idx_threats_threat_model_id ON threats(threat_model_id);
CREATE INDEX idx_diagrams_threat_model_id ON diagrams(threat_model_id);
-- Query optimization indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_threats_threat_model_id_created_at ON threats(threat_model_id, created_at);Check index usage:
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%_pkey'
AND schemaname = 'public';Analyze slow queries:
-- Enable query timing
\timing on
-- Example query analysis
EXPLAIN ANALYZE
SELECT * FROM threats
WHERE threat_model_id = 'uuid-here'
ORDER BY created_at DESC
LIMIT 50;Optimize query patterns:
-- Use LIMIT for large result sets
SELECT * FROM threats LIMIT 50;
-- Use appropriate indexes
-- Good: Uses index
SELECT * FROM threats WHERE threat_model_id = 'uuid';
-- Bad: Full table scan
SELECT * FROM threats WHERE lower(title) LIKE '%search%';
-- Better: Use functional index
CREATE INDEX idx_threats_title_lower ON threats(lower(title));Regular maintenance:
# Manual vacuum and analyze
psql -h postgres-host -U tmi_user -d tmi -c "VACUUM ANALYZE;"
# Check last vacuum/analyze
psql -h postgres-host -U tmi_user -d tmi -c "
SELECT
schemaname,
tablename,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC"Configure autovacuum in postgresql.conf:
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50For read-heavy workloads, add read replicas:
# Configure read replica
database:
postgres:
primary:
host: "postgres-primary"
port: 5432
replicas:
- host: "postgres-replica-1"
port: 5432
- host: "postgres-replica-2"
port: 5432Replication setup:
# On primary server (postgresql.conf)
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB
# Create replication user
psql -U postgres -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'password';"
# On replica server
# Use pg_basebackup to initialize replica
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -vFor high-connection environments:
# Install PgBouncer
sudo apt-get install pgbouncer
# Configure /etc/pgbouncer/pgbouncer.ini
[databases]
tmi = host=postgres-host port=5432 dbname=tmi
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
# Start PgBouncer
systemctl start pgbouncer
# Update TMI to use PgBouncer
POSTGRES_HOST=localhost
POSTGRES_PORT=6432# Edit /etc/redis/redis.conf
# Set memory limit
maxmemory 1gb
# Eviction policy
maxmemory-policy allkeys-lru # Evict least recently used keys
# Or: volatile-lru (only evict keys with TTL)
# Memory optimization
hash-max-ziplist-entries 512
hash-max-ziplist-value 64Balance performance vs durability:
# For performance (may lose data on crash)
appendonly no
save ""
# Balanced (recommended)
appendonly yes
appendfsync everysec
save 900 1
save 300 10
# For durability (slower writes)
appendonly yes
appendfsync always# Disable slow commands
rename-command KEYS ""
rename-command FLUSHALL ""
# TCP backlog
tcp-backlog 511
# TCP keepalive
tcp-keepalive 300
# Lazy freeing
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yesTMI's cache TTL configuration:
| Cache Type | TTL | Justification |
|---|---|---|
| Threat Models | 10 minutes | Core entities, moderate updates |
| Diagrams | 2 minutes | High collaboration, real-time |
| Sub-resources | 5 minutes | Threats, documents, sources |
| Authorization | 15 minutes | Security-critical, infrequent changes |
| Metadata | 7 minutes | Flexible data, moderate updates |
| Lists | 5 minutes | Paginated results |
Adjust based on your usage patterns:
// For high-collaboration environments (reduce TTL)
cache.Set("threat_model:"+id, data, 5*time.Minute)
// For read-heavy environments (increase TTL)
cache.Set("threat_model:"+id, data, 15*time.Minute)Nginx load balancer:
# /etc/nginx/conf.d/tmi-upstream.conf
upstream tmi_backend {
least_conn; # Or: ip_hash for sticky sessions
server tmi-server-1:8080 max_fails=3 fail_timeout=30s;
server tmi-server-2:8080 max_fails=3 fail_timeout=30s;
server tmi-server-3:8080 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name api.tmi.example.com;
location / {
proxy_pass http://tmi_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}HAProxy load balancer:
# /etc/haproxy/haproxy.cfg
frontend tmi_front
bind *:443 ssl crt /etc/ssl/certs/tmi.pem
default_backend tmi_back
backend tmi_back
balance leastconn
option httpchk GET /version
http-check expect status 200
server tmi1 tmi-server-1:8080 check
server tmi2 tmi-server-2:8080 check
server tmi3 tmi-server-3:8080 check# Scale to 3 instances
docker-compose up -d --scale tmi-server=3
# With explicit configuration
docker-compose -f docker-compose.yml -f docker-compose.scale.yml up -d# docker-compose.scale.yml
version: "3.8"
services:
tmi-server:
deploy:
replicas: 3apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: tmi-server-hpa
namespace: tmi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: tmi-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Increase Docker container resources:
docker update tmi-server --memory="2g" --cpus="4.0"Kubernetes resource increase:
resources:
requests:
memory: "1Gi"
cpu: "1000m"
limits:
memory: "2Gi"
cpu: "4000m"Heroku dyno scaling:
# Scale to larger dyno type
heroku ps:resize web=standard-2x --app tmi-server
# Or Performance tier
heroku ps:resize web=performance-m --app tmi-serverPostgreSQL vertical scaling:
# Increase shared_buffers (requires restart)
ALTER SYSTEM SET shared_buffers = '2GB';
SELECT pg_reload_conf(); # Or restart PostgreSQL
# Increase work_mem (no restart)
ALTER SYSTEM SET work_mem = '64MB';
SELECT pg_reload_conf();Redis vertical scaling:
# Increase memory limit
redis-cli CONFIG SET maxmemory 2gb
# Make permanent in redis.conf
echo "maxmemory 2gb" >> /etc/redis/redis.confFor global deployments:
┌─────────────────────┐
│ Global Load Balancer│
└──────────┬───────────┘
│
┌──────┴──────┐
│ │
┌───▼────┐ ┌───▼────┐
│ US-East│ │ EU-West│
│ Region │ │ Region │
└────────┘ └────────┘
│ │
TMI+DB+Cache TMI+DB+Cache
Consider:
- Regional deployments
- Database replication across regions
- CDN for static assets
- DNS-based routing
Track key metrics for capacity planning:
-- Database growth rate
SELECT
date_trunc('month', created_at) AS month,
count(*) AS records
FROM threat_models
GROUP BY month
ORDER BY month;
-- User growth
SELECT
date_trunc('week', created_at) AS week,
count(*) AS new_users
FROM users
GROUP BY week
ORDER BY week;Set alerts for capacity thresholds:
- CPU: Alert at 70%, critical at 85%
- Memory: Alert at 75%, critical at 90%
- Disk: Alert at 75%, critical at 90%
- Database connections: Alert at 70% of max
- Redis memory: Alert at 80%, critical at 95%
Calculate growth rates:
# Database size growth
# Current size: 5GB
# Growth: 100MB/month
# Projected size in 12 months: 5GB + (100MB * 12) = 6.2GB
# User growth
# Current: 100 users
# Growth: 20% month-over-month
# Projected in 12 months: 100 * (1.2^12) = ~900 users- Monitor resource utilization trends
- Project growth rates (users, data, traffic)
- Calculate resource needs for 6-12 months
- Plan scaling activities before reaching thresholds
- Budget for infrastructure growth
- Test scaling procedures in staging
- Document capacity baselines
# HTTP endpoint benchmarking with Apache Bench
ab -n 10000 -c 100 -H "Authorization: Bearer $TOKEN" \
https://api.tmi.example.com/api/threat-models
# WebSocket benchmarking
# Install: npm install -g websocket-bench
wsbench -c 100 -n 1000 wss://api.tmi.example.com/ws/diagrams/{id} \
-H "Authorization: Bearer $TOKEN"
# Full load testing with k6
k6 run load-test.jsExample k6 script (load-test.js):
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 0 }, // Ramp down
],
};
export default function() {
let response = http.get('https://api.tmi.example.com/api/threat-models', {
headers: { 'Authorization': `Bearer ${__ENV.TOKEN}` },
});
check(response, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
});
sleep(1);
}Run benchmark:
TOKEN=$YOUR_TOKEN k6 run load-test.js# PostgreSQL benchmarking with pgbench
createdb pgbench_test
pgbench -i -s 10 pgbench_test # Initialize
pgbench -c 10 -j 2 -t 1000 pgbench_test # Run benchmark
# Results show:
# - Transactions per second (TPS)
# - Average latency
# - Connection overheadApplication KPIs:
- Request throughput (requests/second)
- Response time percentiles (P50, P95, P99)
- Error rate (percentage of 5xx responses)
- WebSocket connection count
- Active user sessions
Database KPIs:
- Query response time
- Connection count
- Cache hit ratio
- Replication lag
- Table sizes
Infrastructure KPIs:
- CPU utilization
- Memory utilization
- Disk I/O
- Network throughput
- Container restarts
Create dashboards tracking:
System Overview:
- Service uptime (%)
- Request rate (req/s)
- Error rate (%)
- Active users
- Response time (P95)
Database Performance:
- Query duration (ms)
- Connection count
- Slow queries
- Cache hit rate
- Database size
Resource Utilization:
- CPU usage (%)
- Memory usage (%)
- Disk usage (%)
- Network I/O (MB/s)
Check:
- Database query performance
- Cache hit rates
- Network latency
- Application logs for errors
- Resource utilization (CPU, memory)
Solutions:
- Optimize slow queries
- Add missing indexes
- Increase cache TTL
- Scale horizontally
- Optimize code
Check:
# Process CPU usage
top -p $(pgrep tmi-server)
# System CPU by process
ps aux --sort=-%cpu | headSolutions:
- Profile application (Go pprof)
- Optimize hot code paths
- Reduce logging
- Scale horizontally
Check:
# Memory usage over time
docker stats tmi-server --no-stream
# Go heap profile
curl http://localhost:8080/debug/pprof/heap > heap.prof
go tool pprof heap.profSolutions:
- Analyze heap dump
- Fix memory leaks in code
- Increase garbage collection frequency
- Restart services periodically
Check:
SELECT count(*) FROM pg_stat_activity;Solutions:
- Increase connection pool size
- Use connection pooler (PgBouncer)
- Fix connection leaks in application
- Optimize query execution time
- Monitoring and Health - Performance monitoring
- Database Operations - Database optimization
- Post-Deployment - Initial performance testing
- Maintenance Tasks - Regular optimization tasks
- Using TMI for Threat Modeling
- Accessing TMI
- Creating Your First Threat Model
- Understanding the User Interface
- Working with Data Flow Diagrams
- Managing Threats
- Collaborative Threat Modeling
- Using Notes and Documentation
- Metadata and Extensions
- Planning Your Deployment
- Deploying TMI Server
- Deploying TMI Web Application
- Setting Up Authentication
- Database Setup
- Component Integration
- Post-Deployment
- Monitoring and Health
- Database Operations
- Security Operations
- Performance and Scaling
- Maintenance Tasks