Database Backup & Restore Strategy

Overview

pNode Pulse uses PostgreSQL with TimescaleDB extension for time-series data. This document outlines the backup and restore procedures to ensure data integrity and business continuity.

Backup Strategy

Schedule & Retention

Frequency	Retention	Storage Location
Daily	30 days	`/backups/pnode-pulse` on VPS
Weekly	90 days	S3 (optional, off-site)
Monthly	1 year	S3 Glacier (optional, archival)

Backup Time: 2:00 AM UTC daily (scheduled via cron)

Backup Components

Full Database: All tables, indexes, sequences
TimescaleDB Hypertables: node_metrics, network_stats, etc.
Schema: Complete schema with all constraints
Data: All historical and current data

Backup Format

Format: PostgreSQL custom format (.dump)
Compression: Level 9 (maximum)
Typical Size: ~100-500 MB (varies with data volume)
Growth Rate: ~20-50 MB/day (estimated)

Automated Backups

Setup Cron Job

SSH into VPS as pnodepulse user and configure cron:

# Edit crontab
crontab -e

# Add this line (runs daily at 2 AM)
0 2 * * * POSTGRES_PASSWORD=$POSTGRES_PASSWORD POSTGRES_HOST=localhost POSTGRES_PORT=5434 /home/pnodepulse/pnode-pulse/scripts/backup-db.sh >> /var/log/pnode-pulse-backup.log 2>&1

Manual Backup

# SSH to VPS
ssh pnodepulse

# Set environment variables
export POSTGRES_PASSWORD=<your_password>
export POSTGRES_HOST=localhost
export POSTGRES_PORT=5434

# Run backup script
cd ~/pnode-pulse
./scripts/backup-db.sh

Verify Backup

# List recent backups
ls -lh /backups/pnode-pulse/ | tail -5

# Check backup integrity
pg_restore --list /backups/pnode-pulse/pnode-pulse_20251209_020000.dump | head -20

# Get backup size
du -h /backups/pnode-pulse/pnode-pulse_20251209_020000.dump

Restore Procedures

Pre-Restore Checklist

Backup current state (if database is still accessible)
Notify team about planned downtime
Stop application services to prevent data inconsistency
Verify backup file exists and is not corrupted
Document reason for restore (incident report)

Full Database Restore

# SSH to VPS
ssh pnodepulse

# Set environment variables
export POSTGRES_PASSWORD=<your_password>
export POSTGRES_HOST=localhost
export POSTGRES_PORT=5434

# Run restore script (DESTRUCTIVE - requires confirmation)
cd ~/pnode-pulse
./scripts/restore-db.sh /backups/pnode-pulse/pnode-pulse_20251209_020000.dump

# Follow prompts and type 'YES' to confirm

The script will:

✋ Prompt for confirmation (destructive operation)
🛑 Stop application (blue/green/staging containers)
🗑️ Drop existing objects (--clean --if-exists)
📥 Restore from backup
✅ Verify restoration (count nodes and metrics)
🚀 Restart application

Partial Restore (Advanced)

If you need to restore specific tables only:

# List tables in backup
pg_restore --list /backups/pnode-pulse/pnode-pulse_20251209_020000.dump | grep TABLE

# Restore specific table
PGPASSWORD=$POSTGRES_PASSWORD pg_restore \
  -h localhost \
  -p 5434 \
  -U pnodepulse \
  -d pnodepulse \
  --table=nodes \
  /backups/pnode-pulse/pnode-pulse_20251209_020000.dump

Backup Monitoring

Health Checks

Create a simple monitoring script to verify backups are running:

#!/bin/bash
# Check if backup ran in last 26 hours

LATEST_BACKUP=$(ls -t /backups/pnode-pulse/pnode-pulse_*.dump | head -1)
BACKUP_AGE_HOURS=$(( ($(date +%s) - $(stat -c %Y "$LATEST_BACKUP")) / 3600 ))

if [ "$BACKUP_AGE_HOURS" -gt 26 ]; then
  echo "⚠️  WARNING: Last backup is ${BACKUP_AGE_HOURS} hours old"
  # Send alert to monitoring system
  exit 1
else
  echo "✓ Backup is current (${BACKUP_AGE_HOURS} hours old)"
  exit 0
fi

Alerting

Configure alerts for:

Backup fails to complete (cron sends email on error)
Backup file size anomaly (too small = incomplete backup)
No backup in 26 hours (missed schedule)
Backup directory disk space < 20% free

Disaster Recovery Scenarios

Scenario 1: Database Corruption

Symptoms: Application errors, query failures, data inconsistencies

Recovery:

Stop application immediately
Backup corrupted database (if possible): pg_dump -Fc > corrupted_backup.dump
Restore from last known good backup
Assess data loss (time between last backup and corruption)
If recent backup, investigate logs to prevent recurrence

RTO: ~15-30 minutes
RPO: Up to 24 hours (daily backups)

Scenario 2: Accidental Data Deletion

Symptoms: Missing nodes, metrics, or other data

Recovery:

DO NOT run any DELETE or UPDATE queries
Immediately create backup of current state
Restore to staging environment from most recent backup
Extract deleted data from staging backup
Manually re-insert into production (or full restore if extensive)

RTO: ~1-4 hours
RPO: Up to 24 hours

Scenario 3: Complete Data Center Failure

Symptoms: VPS unreachable, hardware failure

Recovery:

Provision new VPS
Setup Docker, PostgreSQL, Redis
Download latest backup from S3 (if configured) or copy from local storage
Restore database
Deploy application containers
Update DNS if needed

RTO: ~4-8 hours
RPO: Up to 24 hours (or minutes if using S3 + frequent backups)

Off-Site Backups (S3)

Off-site backups provide geographic redundancy and protection against VPS failure.

Supported Providers

Provider	Endpoint	Cost (approx.)
AWS S3	(default)	$0.023/GB/month
Backblaze B2	`s3.REGION.backblazeb2.com`	$0.005/GB/month
Wasabi	`s3.REGION.wasabisys.com`	$0.007/GB/month
MinIO	Self-hosted	Free (self-hosted)

Quick Setup (Interactive)

Run the interactive setup script:

ssh pnodepulse
cd ~/pnode-pulse
./scripts/setup-s3-backup.sh

This will:

Install AWS CLI if needed
Guide you through provider selection
Configure credentials
Test the connection
Save configuration to ~/.pnode-pulse-backup.env

Manual Setup

1. Install AWS CLI

pip3 install awscli
# or
apt install awscli

2. Configure Credentials

Option A: AWS Configure (interactive)

aws configure
# Enter: Access Key ID, Secret Access Key, Region, Output format

Option B: Environment Variables

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
export AWS_S3_BUCKET="pnode-pulse-backups"

# For non-AWS providers (Backblaze, Wasabi, MinIO):
export AWS_ENDPOINT_URL="https://s3.us-west-002.backblazeb2.com"

3. Create Bucket

# AWS S3
aws s3 mb s3://pnode-pulse-backups --region us-east-1

# Backblaze B2
aws s3 mb s3://pnode-pulse-backups --endpoint-url https://s3.us-west-002.backblazeb2.com

# Wasabi
aws s3 mb s3://pnode-pulse-backups --endpoint-url https://s3.us-east-1.wasabisys.com

4. Enable S3 in Backup Script

The backup script automatically uploads to S3 when AWS_S3_BUCKET is set:

# Create config file
cat > ~/.pnode-pulse-backup.env << 'EOF'
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
export AWS_S3_BUCKET="pnode-pulse-backups"
export AWS_S3_STORAGE_CLASS="STANDARD_IA"
export S3_RETENTION_DAYS="90"
EOF

chmod 600 ~/.pnode-pulse-backup.env

5. Update Crontab

crontab -e

# Add this line:
0 2 * * * source ~/.pnode-pulse-backup.env && POSTGRES_PASSWORD=$POSTGRES_PASSWORD /home/pnodepulse/pnode-pulse/scripts/backup-db.sh >> /var/log/pnode-pulse-backup.log 2>&1

Restore from S3

Use the restore script to download and restore from S3:

# Source S3 credentials
source ~/.pnode-pulse-backup.env
export POSTGRES_PASSWORD=your_password

# List available backups
./scripts/restore-from-s3.sh

# Restore latest backup
./scripts/restore-from-s3.sh latest

# Restore specific backup
./scripts/restore-from-s3.sh 2025/12/pnode-pulse_20251215_020000.dump

S3 Backup Structure

Backups are organized by year/month:

s3://pnode-pulse-backups/
  backups/
    latest.dump              # Most recent backup (always updated)
    2025/
      12/
        pnode-pulse_20251215_020000.dump
        pnode-pulse_20251214_020000.dump
      11/
        pnode-pulse_20251130_020000.dump
        ...

Retention Policy

Storage	Retention	Managed By
Local (`/backups/`)	30 days	`backup-db.sh` (RETENTION_DAYS)
S3	90 days	`backup-db.sh` (S3_RETENTION_DAYS)
S3 Glacier	1 year	AWS Lifecycle Rules (manual setup)

Storage Classes

Class	Use Case	Retrieval Time	Cost
STANDARD	Latest backup only	Instant	Higher
STANDARD_IA	Daily backups (default)	Instant	Lower
GLACIER	Monthly archives	Hours	Lowest

Cost Estimate

Data Volume	S3 Standard-IA	Backblaze B2	Wasabi
1 GB	$0.02/month	$0.005/month	$0.007/month
10 GB	$0.23/month	$0.05/month	$0.07/month
100 GB	$2.30/month	$0.50/month	$0.70/month

Verify S3 Backups

# List recent S3 backups
source ~/.pnode-pulse-backup.env
aws s3 ls s3://${AWS_S3_BUCKET}/backups/ --recursive | tail -10

# Check latest backup size
aws s3 ls s3://${AWS_S3_BUCKET}/backups/latest.dump

# Download and verify backup integrity
aws s3 cp s3://${AWS_S3_BUCKET}/backups/latest.dump /tmp/verify.dump
pg_restore --list /tmp/verify.dump | head -20
rm /tmp/verify.dump

Testing Restore Procedures

CRITICAL: Test restore procedures regularly!

Quarterly Restore Test

Setup test environment (staging database)
Select random backup from last 30 days
Restore to staging
Verify data integrity: Check row counts, recent data
Test application: Ensure queries work, UI loads
Document results: Note any issues, update procedures

Example Test Script

#!/bin/bash
# Quarterly restore test

BACKUP_FILE=$(ls -t /backups/pnode-pulse/*.dump | shuf -n 1)
echo "Testing restore of: $BACKUP_FILE"

# Restore to staging database
PGPASSWORD=$POSTGRES_PASSWORD pg_restore \
  -h localhost -p 5435 -U pnodepulse -d pnodepulse_staging \
  --clean --if-exists \
  "$BACKUP_FILE"

# Verify
PGPASSWORD=$POSTGRES_PASSWORD psql -h localhost -p 5435 -U pnodepulse -d pnodepulse_staging -c "
  SELECT 'Nodes: ' || COUNT(*) FROM nodes;
  SELECT 'Metrics: ' || COUNT(*) FROM node_metrics;
"

echo "✓ Restore test completed"

Troubleshooting

Backup Script Fails

Error: POSTGRES_PASSWORD environment variable is required
Fix: Export password before running: export POSTGRES_PASSWORD=<password>

Error: Permission denied: /backups/pnode-pulse
Fix: Create directory with correct permissions: sudo mkdir -p /backups/pnode-pulse && sudo chown pnodepulse: /backups/pnode-pulse

Error: Backup file is empty (0 bytes)
Fix: Check PostgreSQL logs, ensure database is accessible, verify credentials

Restore Script Fails

Error: database "pnodepulse" does not exist
Fix: Create database first: createdb -h localhost -p 5434 -U pnodepulse pnodepulse

Error: role "pnodepulse" does not exist
Fix: Restore to fresh PostgreSQL requires creating user first

Error: Application won't start after restore
Fix: Check schema version, may need to run migrations: npm run db:migrate

Security Considerations

Encrypt Backups: S3 server-side encryption (AES-256) recommended
Access Control: Limit backup file permissions: chmod 600 /backups/*.dump
Password Storage: Use environment files, never hardcode in scripts
Audit Logs: Log all backup/restore operations with timestamps
Offsite Storage: Keep copies outside VPS in case of complete failure

References

PostgreSQL Backup: https://www.postgresql.org/docs/current/backup.html
TimescaleDB Backup: https://docs.timescale.com/timescaledb/latest/how-to-guides/backup-and-restore/
pg_dump: https://www.postgresql.org/docs/current/app-pgdump.html
pg_restore: https://www.postgresql.org/docs/current/app-pgrestore.html

Last Updated: 2025-12-15
Owner: DevOps Team
Review Schedule: Quarterly

Uh oh!

FilesExpand file tree

DATABASE_BACKUP.md

Latest commit

History

DATABASE_BACKUP.md

File metadata and controls

Database Backup & Restore Strategy

Overview

Backup Strategy

Schedule & Retention

Backup Components

Backup Format

Automated Backups

Setup Cron Job

Manual Backup

Verify Backup

Restore Procedures

Pre-Restore Checklist

Full Database Restore

Partial Restore (Advanced)

Backup Monitoring

Health Checks

Alerting

Disaster Recovery Scenarios

Scenario 1: Database Corruption

Scenario 2: Accidental Data Deletion

Scenario 3: Complete Data Center Failure

Off-Site Backups (S3)

Supported Providers

Quick Setup (Interactive)

Manual Setup

1. Install AWS CLI

2. Configure Credentials

3. Create Bucket

4. Enable S3 in Backup Script

5. Update Crontab

Restore from S3

S3 Backup Structure

Retention Policy

Storage Classes

Cost Estimate

Verify S3 Backups

Testing Restore Procedures

Quarterly Restore Test

Example Test Script

Troubleshooting

Backup Script Fails

Restore Script Fails

Security Considerations

References