-
Notifications
You must be signed in to change notification settings - Fork 0
Maintenance Tasks
This guide covers routine maintenance, updates, certificate renewal, log rotation, and scheduled maintenance tasks for TMI operations.
Regular maintenance ensures TMI continues to operate reliably and securely. This guide covers:
- Daily, weekly, and monthly maintenance tasks
- Automated maintenance procedures
- Certificate renewal
- Log rotation and cleanup
- Software updates
- Backup verification
- Database maintenance
Automated:
- Health check monitoring (continuous)
- Log collection and aggregation
- Backup execution (2 AM daily)
- Metric collection and alerting
Manual (if needed):
- Review critical alerts
- Check service status
- Monitor error rates
- Review backup integrity
- Check certificate expiration dates
- Review security alerts and logs
- Check disk space usage
- Review application logs for errors
- Monitor performance trends
- Apply security updates
- Review and optimize database performance
- Clean up old logs and backups
- Review user access and permissions
- Test disaster recovery procedures
- Review capacity and scaling needs
- Update documentation
- Full security audit
- Disaster recovery test
- Performance benchmarking
- Dependency updates
- Review and update runbooks
- Team training updates
Create automated maintenance tasks with systemd timers.
Create /etc/systemd/system/tmi-backup.service:
[Unit]
Description=TMI Database Backup
After=network.target postgresql.service
[Service]
Type=oneshot
User=tmi
ExecStart=/usr/local/bin/backup-tmi.shCreate /etc/systemd/system/tmi-backup.timer:
[Unit]
Description=TMI Daily Backup Timer
Requires=tmi-backup.service
[Timer]
OnCalendar=daily
OnCalendar=02:00
Persistent=true
[Install]
WantedBy=timers.targetEnable and start:
sudo systemctl enable tmi-backup.timer
sudo systemctl start tmi-backup.timer
# Check timer status
systemctl list-timers tmi-backup.timerCreate /etc/systemd/system/tmi-maintenance.service:
[Unit]
Description=TMI Weekly Maintenance
After=network.target
[Service]
Type=oneshot
User=tmi
ExecStart=/usr/local/bin/tmi-weekly-maintenance.shCreate /etc/systemd/system/tmi-maintenance.timer:
[Unit]
Description=TMI Weekly Maintenance Timer
Requires=tmi-maintenance.service
[Timer]
OnCalendar=weekly
OnCalendar=Sun 03:00
Persistent=true
[Install]
WantedBy=timers.targetAlternative to systemd timers:
# Edit crontab
crontab -e
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/backup-tmi.sh
# Weekly maintenance on Sunday at 3 AM
0 3 * * 0 /usr/local/bin/tmi-weekly-maintenance.sh
# Daily log rotation at midnight
0 0 * * * /usr/local/bin/rotate-tmi-logs.sh
# Certificate check daily
0 6 * * * /usr/local/bin/check-tmi-certs.sh
# Monthly cleanup on first day at 4 AM
0 4 1 * * /usr/local/bin/tmi-monthly-cleanup.shLet's Encrypt certificates renew automatically:
# Check certbot timer status
systemctl status certbot.timer
# Test renewal
sudo certbot renew --dry-run
# Force renewal (if needed)
sudo certbot renew --force-renewal
# Check renewal logs
sudo tail -f /var/log/letsencrypt/letsencrypt.logCreate renewal script /usr/local/bin/renew-tmi-cert.sh:
#!/bin/bash
# TMI Certificate Renewal Script
CERT_DIR="/etc/tmi/certs"
BACKUP_DIR="/var/backups/tmi/certs"
LOG_FILE="/var/log/tmi/cert-renewal.log"
DAYS_BEFORE_EXPIRY=30
# Function to check certificate expiration
check_expiry() {
local cert_file=$1
local expiry_date=$(openssl x509 -enddate -noout -in "$cert_file" | cut -d= -f2)
local expiry_epoch=$(date -d "$expiry_date" +%s)
local now_epoch=$(date +%s)
local days_remaining=$(( ($expiry_epoch - $now_epoch) / 86400 ))
echo $days_remaining
}
# Check if renewal is needed
days_left=$(check_expiry "$CERT_DIR/server.crt")
if [ $days_left -le $DAYS_BEFORE_EXPIRY ]; then
echo "$(date): Certificate expires in $days_left days, renewing..." >> $LOG_FILE
# Backup old certificates
mkdir -p $BACKUP_DIR
cp $CERT_DIR/server.crt $BACKUP_DIR/server.crt.$(date +%Y%m%d)
cp $CERT_DIR/server.key $BACKUP_DIR/server.key.$(date +%Y%m%d)
# Generate new certificate (modify for your CA/provider)
openssl req -x509 -newkey rsa:4096 -nodes \
-keyout $CERT_DIR/server.key.new \
-out $CERT_DIR/server.crt.new \
-days 365 \
-subj "/CN=tmi.example.com"
# Install new certificates
mv $CERT_DIR/server.key.new $CERT_DIR/server.key
mv $CERT_DIR/server.crt.new $CERT_DIR/server.crt
# Set permissions
chmod 600 $CERT_DIR/server.key
chmod 644 $CERT_DIR/server.crt
chown tmi:tmi $CERT_DIR/*
# Restart TMI server
systemctl restart tmi
echo "$(date): Certificate renewal completed" >> $LOG_FILE
# Send notification
echo "TMI certificate renewed successfully" | \
mail -s "Certificate Renewal Completed" [email protected]
else
echo "$(date): Certificate valid for $days_left days, no renewal needed" >> $LOG_FILE
fiMake executable and schedule:
chmod +x /usr/local/bin/renew-tmi-cert.sh
# Run daily
crontab -e
0 6 * * * /usr/local/bin/renew-tmi-cert.shCreate monitoring script /usr/local/bin/check-tmi-certs.sh:
#!/bin/bash
# TMI Certificate Monitoring Script
CERT_FILE="/etc/tmi/certs/server.crt"
ALERT_DAYS=30
# Get expiration date
expiry_date=$(openssl x509 -enddate -noout -in $CERT_FILE | cut -d= -f2)
expiry_epoch=$(date -d "$expiry_date" +%s)
now_epoch=$(date +%s)
days_remaining=$(( ($expiry_epoch - $now_epoch) / 86400 ))
# Alert if expiring soon
if [ $days_remaining -le $ALERT_DAYS ]; then
echo "WARNING: TMI certificate expires in $days_remaining days" | \
mail -s "Certificate Expiration Warning" [email protected]
fi
# Log status
echo "$(date): Certificate expires in $days_remaining days" >> /var/log/tmi/cert-check.logTMI includes automatic log rotation, but you can configure additional rotation with logrotate.
Create /etc/logrotate.d/tmi:
/var/log/tmi/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0640 tmi tmi
sharedscripts
postrotate
# Restart or reload if needed
systemctl reload tmi > /dev/null 2>&1 || true
endscript
}
# Force log rotation
logrotate -f /etc/logrotate.d/tmi
# Test configuration
logrotate -d /etc/logrotate.d/tmiCreate /usr/local/bin/cleanup-tmi-logs.sh:
#!/bin/bash
# TMI Log Cleanup Script
LOG_DIR="/var/log/tmi"
RETENTION_DAYS=90
# Delete logs older than retention period
find $LOG_DIR -name "*.log.*" -mtime +$RETENTION_DAYS -delete
# Delete compressed logs
find $LOG_DIR -name "*.gz" -mtime +$RETENTION_DAYS -delete
# Log cleanup action
echo "$(date): Cleaned up logs older than $RETENTION_DAYS days" >> $LOG_DIR/cleanup.log
# Check disk space
disk_usage=$(df -h $LOG_DIR | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $disk_usage -gt 80 ]; then
echo "WARNING: Log directory is ${disk_usage}% full" | \
mail -s "TMI Log Disk Space Warning" [email protected]
fiSchedule:
# Monthly on first day
0 4 1 * * /usr/local/bin/cleanup-tmi-logs.shArchive old logs to long-term storage:
#!/bin/bash
# Archive logs to S3/cloud storage
ARCHIVE_DIR="/var/log/tmi/archive"
S3_BUCKET="s3://my-tmi-logs"
DATE=$(date -d "last month" +%Y-%m)
# Create archive
mkdir -p $ARCHIVE_DIR
tar -czf $ARCHIVE_DIR/tmi-logs-$DATE.tar.gz \
/var/log/tmi/*.log.* \
--remove-files
# Upload to S3
aws s3 cp $ARCHIVE_DIR/tmi-logs-$DATE.tar.gz $S3_BUCKET/
# Remove local archive after upload
rm $ARCHIVE_DIR/tmi-logs-$DATE.tar.gz
echo "$(date): Archived and uploaded logs for $DATE" >> /var/log/tmi/archive.logCreate /usr/local/bin/tmi-db-maintenance.sh:
#!/bin/bash
# TMI Database Maintenance Script
POSTGRES_HOST="postgres-host"
POSTGRES_USER="tmi_user"
POSTGRES_DB="tmi"
LOG_FILE="/var/log/tmi/db-maintenance.log"
echo "$(date): Starting database maintenance" >> $LOG_FILE
# Vacuum and analyze all tables
psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB << EOF
VACUUM ANALYZE;
EOF
if [ $? -eq 0 ]; then
echo "$(date): Database maintenance completed successfully" >> $LOG_FILE
else
echo "$(date): Database maintenance failed" >> $LOG_FILE
echo "Database maintenance failed" | \
mail -s "TMI Database Maintenance Failed" [email protected]
fi
# Check database size
db_size=$(psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -t -c \
"SELECT pg_size_pretty(pg_database_size('$POSTGRES_DB'))")
echo "$(date): Database size: $db_size" >> $LOG_FILE
# Check table bloat
psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -c "
SELECT
schemaname,
tablename,
n_dead_tup,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 10" >> $LOG_FILESchedule weekly:
# Sunday at 3 AM
0 3 * * 0 /usr/local/bin/tmi-db-maintenance.shCreate /usr/local/bin/tmi-reindex.sh:
#!/bin/bash
# TMI Index Maintenance (reindex fragmented indexes)
POSTGRES_HOST="postgres-host"
POSTGRES_USER="tmi_user"
POSTGRES_DB="tmi"
LOG_FILE="/var/log/tmi/reindex.log"
echo "$(date): Starting index maintenance" >> $LOG_FILE
# Reindex database (less intrusive than REINDEX DATABASE)
psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB << EOF
REINDEX DATABASE CONCURRENTLY $POSTGRES_DB;
EOF
if [ $? -eq 0 ]; then
echo "$(date): Reindex completed successfully" >> $LOG_FILE
else
echo "$(date): Reindex failed" >> $LOG_FILE
fiSchedule monthly:
# First Sunday at 4 AM
0 4 1-7 * 0 /usr/local/bin/tmi-reindex.shCreate /usr/local/bin/verify-tmi-backups.sh:
#!/bin/bash
# Verify backup integrity
BACKUP_DIR="/var/backups/postgresql/tmi"
RESTORE_TEST_DB="tmi_restore_test"
LOG_FILE="/var/log/tmi/backup-verification.log"
# Find most recent backup
LATEST_BACKUP=$(ls -t $BACKUP_DIR/tmi_*.dump | head -1)
if [ -z "$LATEST_BACKUP" ]; then
echo "$(date): No backup found" >> $LOG_FILE
exit 1
fi
echo "$(date): Verifying backup: $LATEST_BACKUP" >> $LOG_FILE
# Drop test database if exists
psql -U postgres -c "DROP DATABASE IF EXISTS $RESTORE_TEST_DB"
# Create test database
createdb -U postgres $RESTORE_TEST_DB
# Restore backup to test database
pg_restore -U postgres -d $RESTORE_TEST_DB $LATEST_BACKUP 2>&1 | \
grep -v "WARNING" >> $LOG_FILE
if [ $? -eq 0 ]; then
echo "$(date): Backup verification successful" >> $LOG_FILE
# Basic validation queries
table_count=$(psql -U postgres -d $RESTORE_TEST_DB -t -c "
SELECT count(*) FROM information_schema.tables
WHERE table_schema = 'public'")
echo "$(date): Restored $table_count tables" >> $LOG_FILE
else
echo "$(date): Backup verification FAILED" >> $LOG_FILE
echo "Backup verification failed for $LATEST_BACKUP" | \
mail -s "TMI Backup Verification Failed" [email protected]
fi
# Cleanup test database
psql -U postgres -c "DROP DATABASE $RESTORE_TEST_DB"Schedule weekly:
# Monday at 5 AM
0 5 * * 1 /usr/local/bin/verify-tmi-backups.shCreate /usr/local/bin/cleanup-tmi-backups.sh:
#!/bin/bash
# Clean up old backups
BACKUP_DIR="/var/backups/postgresql/tmi"
RETENTION_DAYS=30
LOG_FILE="/var/log/tmi/backup-cleanup.log"
echo "$(date): Starting backup cleanup" >> $LOG_FILE
# Count backups before cleanup
before_count=$(ls $BACKUP_DIR/tmi_*.dump 2>/dev/null | wc -l)
# Delete old backups
find $BACKUP_DIR -name "tmi_*.dump" -mtime +$RETENTION_DAYS -delete
# Count backups after cleanup
after_count=$(ls $BACKUP_DIR/tmi_*.dump 2>/dev/null | wc -l)
deleted_count=$((before_count - after_count))
echo "$(date): Deleted $deleted_count backups older than $RETENTION_DAYS days" >> $LOG_FILE
echo "$(date): $after_count backups remaining" >> $LOG_FILE
# Check backup directory size
backup_size=$(du -sh $BACKUP_DIR | awk '{print $1}')
echo "$(date): Backup directory size: $backup_size" >> $LOG_FILE- Review release notes for breaking changes
- Test in staging environment
- Schedule maintenance window
- Create backup before update
- Apply updates
- Run smoke tests
- Monitor for issues
#!/bin/bash
# Update TMI server
# 1. Stop service
systemctl stop tmi
# 2. Backup current version
cp /opt/tmi/tmi-server /opt/tmi/tmi-server.backup
# 3. Download new version
curl -L https://github.com/ericfitz/tmi/releases/download/v1.x.x/tmi-server-linux-amd64 \
-o /opt/tmi/tmi-server.new
# 4. Verify checksum
sha256sum /opt/tmi/tmi-server.new
# 5. Install new version
mv /opt/tmi/tmi-server.new /opt/tmi/tmi-server
chmod +x /opt/tmi/tmi-server
# 6. Run migrations
cd /opt/tmi && ./bin/migrate up
# 7. Start service
systemctl start tmi
# 8. Verify
sleep 5
curl http://localhost:8080/version
# 9. Check logs
journalctl -u tmi -n 50# Pull latest images
docker-compose pull
# Recreate containers with new images
docker-compose up -d
# Verify
docker-compose ps
docker-compose logs tmi-server# Update PostgreSQL (Ubuntu)
sudo apt-get update
sudo apt-get upgrade postgresql
# Update Redis
sudo apt-get update
sudo apt-get upgrade redis-server
# Restart services
sudo systemctl restart postgresql
sudo systemctl restart redis-serverCreate /usr/local/bin/tmi-weekly-health-check.sh:
#!/bin/bash
# Weekly comprehensive health check
REPORT_FILE="/tmp/tmi-health-$(date +%Y%m%d).txt"
echo "TMI Health Check Report - $(date)" > $REPORT_FILE
echo "======================================" >> $REPORT_FILE
# Service status
echo -e "\n## Service Status" >> $REPORT_FILE
systemctl status tmi | grep -E "Active|Memory|CPU" >> $REPORT_FILE
# Database health
echo -e "\n## Database Health" >> $REPORT_FILE
psql -h postgres-host -U tmi_user -d tmi -c "
SELECT
'Connections: ' || count(*) as info
FROM pg_stat_activity
UNION ALL
SELECT
'Database Size: ' || pg_size_pretty(pg_database_size('tmi'))
" >> $REPORT_FILE
# Redis health
echo -e "\n## Redis Health" >> $REPORT_FILE
redis-cli -h redis-host info memory | grep used_memory_human >> $REPORT_FILE
redis-cli -h redis-host info stats | grep keyspace_hits >> $REPORT_FILE
# Disk space
echo -e "\n## Disk Space" >> $REPORT_FILE
df -h | grep -E "Filesystem|/var|/opt" >> $REPORT_FILE
# Certificate expiry
echo -e "\n## Certificate Status" >> $REPORT_FILE
cert_days=$(( ($(date -d "$(openssl x509 -enddate -noout -in /etc/tmi/certs/server.crt | cut -d= -f2)" +%s) - $(date +%s)) / 86400 ))
echo "Certificate expires in $cert_days days" >> $REPORT_FILE
# Recent errors
echo -e "\n## Recent Errors (last 24h)" >> $REPORT_FILE
grep -i error /var/log/tmi/tmi.log | tail -20 >> $REPORT_FILE
# Email report
cat $REPORT_FILE | mail -s "TMI Weekly Health Report" [email protected]
# Cleanup
rm $REPORT_FILEIf using Prometheus, configure retention:
# prometheus.yml
global:
retention: 30d # Keep metrics for 30 days
# Or via command line
prometheus --storage.tsdb.retention.time=30d- Review and update dashboards monthly
- Archive unused dashboards
- Document dashboard purpose and usage
- Share dashboards with team
Keep operational documentation current:
- Update runbooks after incidents
- Document new procedures
- Archive outdated documentation
- Review and update quarterly
When emergency maintenance is required:
- Assess urgency: Critical vs non-critical
- Notify stakeholders: Users, team, management
- Create incident ticket: Track the issue
- Perform maintenance: Follow emergency runbook
- Verify resolution: Test functionality
- Post-incident review: Learn and improve
Maintain current contact list:
# /etc/tmi/emergency-contacts.yml
contacts:
primary_oncall:
name: "John Doe"
phone: "+1-555-0100"
email: "[email protected]"
backup_oncall:
name: "Jane Smith"
phone: "+1-555-0101"
email: "[email protected]"
database_admin:
name: "DB Team"
email: "[email protected]"
slack: "#db-team"
security_team:
email: "[email protected]"
phone: "+1-555-0911"Plan maintenance windows:
- Weekly: Sunday 2-4 AM (low traffic)
- Monthly: First Sunday 2-6 AM
- Emergency: As needed with notification
Template for user notification:
Subject: Scheduled Maintenance - TMI Service
Dear TMI Users,
We will be performing scheduled maintenance on the TMI service:
Date: Sunday, November 17, 2025
Time: 2:00 AM - 4:00 AM EST
Duration: Approximately 2 hours
During this time, TMI will be unavailable.
Maintenance activities:
- Security updates
- Database optimization
- Performance improvements
We apologize for any inconvenience.
Best regards,
TMI Operations Team
Print and use for regular maintenance:
TMI Monthly Maintenance Checklist
Date: _______________ Performed by: _______________
[ ] Review service health and uptime
[ ] Check disk space (goal: <75% usage)
[ ] Review and address security alerts
[ ] Apply security patches and updates
[ ] Verify backup integrity
[ ] Check certificate expiration (>30 days remaining)
[ ] Review application logs for errors
[ ] Optimize database (vacuum, analyze, reindex if needed)
[ ] Review performance metrics and trends
[ ] Clean up old logs and backups
[ ] Test disaster recovery procedures
[ ] Update documentation
[ ] Review capacity planning needs
Notes:
_________________________________________________
_________________________________________________
_________________________________________________
Completion Date: _______________
- Monitoring and Health - Ongoing monitoring procedures
- Database Operations - Database maintenance details
- Security Operations - Security maintenance tasks
- Performance and Scaling - Performance optimization
- Using TMI for Threat Modeling
- Accessing TMI
- Creating Your First Threat Model
- Understanding the User Interface
- Working with Data Flow Diagrams
- Managing Threats
- Collaborative Threat Modeling
- Using Notes and Documentation
- Metadata and Extensions
- Planning Your Deployment
- Deploying TMI Server
- Deploying TMI Web Application
- Setting Up Authentication
- Database Setup
- Component Integration
- Post-Deployment
- Monitoring and Health
- Database Operations
- Security Operations
- Performance and Scaling
- Maintenance Tasks