Skip to content

Latest commit

Β 

History

History
503 lines (391 loc) Β· 13.7 KB

File metadata and controls

503 lines (391 loc) Β· 13.7 KB

πŸš€ OpenParliament Complete Docker Infrastructure Deployment

This deployment provides a complete, production-ready infrastructure that integrates:

  • Parliament Django Application - The main OpenParliament website
  • OpenMetadata - Metadata management and data discovery platform
  • Apache Airflow - Workflow orchestration and data pipeline management
  • Complete Database Stack - PostgreSQL for all services
  • Search Infrastructure - Solr + OpenSearch for full-text search
  • Caching Layer - Redis for high-performance caching

🎯 Key Success Criteria

This deployment achieves the CRITICAL production requirements:

βœ… Parliament Data β†’ OpenMetadata Discovery - Automated metadata discovery
βœ… OpenMetadata β†’ Airflow Integration - Workflow orchestration
βœ… End-to-End Data Pipelines - Complete data processing workflows
βœ… Health Monitoring - Comprehensive service health checks
βœ… Data Quality Validation - Automated data quality monitoring

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Parliament    β”‚    β”‚   OpenMetadata  β”‚    β”‚  Apache Airflow β”‚
β”‚   Django App    │◄──►│     Server      │◄──►│   Scheduler     β”‚
β”‚   Port: 8000    β”‚    β”‚   Port: 8585    β”‚    β”‚   Port: 8081    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   PostgreSQL    β”‚    β”‚   PostgreSQL    β”‚    β”‚   PostgreSQL    β”‚
β”‚   (Parliament)  β”‚    β”‚  (OpenMetadata) β”‚    β”‚    (Airflow)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                                              β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚      Redis      β”‚    β”‚   OpenSearch    β”‚    β”‚      Solr       β”‚
         β”‚   (Caching)     β”‚    β”‚   (Metadata)    β”‚    β”‚   (Parliament)  β”‚
         β”‚   Port: 6379    β”‚    β”‚   Port: 9200    β”‚    β”‚   Port: 8983    β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“‹ Prerequisites

Required Software

  • Docker >= 20.10.0
  • Docker Compose >= 2.0.0 (or docker-compose >= 1.28.0)
  • Python >= 3.8 (for verification scripts)
  • Git (for cloning the repository)

System Requirements

  • RAM: 8GB minimum, 16GB recommended
  • Storage: 20GB available disk space
  • CPU: 4 cores recommended
  • Network: Internet access for downloading Docker images

Check Prerequisites

# Check Docker
docker --version
docker-compose --version

# Check Python
python3 --version

# Check available resources
docker system df
free -h
df -h

πŸš€ Quick Start Deployment

1. Clone and Prepare

# Clone the repository
git clone <repository-url>
cd openparliament

# Make deployment script executable
chmod +x scripts/deploy-infrastructure.sh

2. Configure Environment

# Copy environment template
cp .env.example .env

# Edit configuration (optional - defaults work for development)
nano .env

3. Deploy Complete Infrastructure

# Full deployment with all services
./scripts/deploy-infrastructure.sh

# Or with options:
./scripts/deploy-infrastructure.sh --skip-build --verbose

4. Verify Deployment

The deployment script automatically runs health checks, but you can also run them manually:

# Check all services
python3 scripts/verify-services.py

# Verify data flows
python3 scripts/verify-data-flow.py

🌐 Access Services

After successful deployment:

Service URL Purpose
Parliament App http://localhost:8000 Main OpenParliament website
OpenMetadata http://localhost:8585 Metadata management dashboard
Airflow http://localhost:8081 Pipeline orchestration (admin/admin)
Airflow Flower http://localhost:5555 Task monitoring
Solr Admin http://localhost:8983 Search administration
OpenSearch http://localhost:9200 Search cluster status

πŸ” Data Flow Verification

The deployment includes comprehensive data flow testing:

Automated Testing

# Run complete data flow verification
python3 scripts/verify-data-flow.py --wait 60

# Check individual components
python3 scripts/verify-services.py --wait 30

Manual Verification Steps

  1. Parliament Database β†’ OpenMetadata

    # Check if OpenMetadata can discover Parliament tables
    curl http://localhost:8585/api/v1/databases
    curl http://localhost:8585/api/v1/tables
  2. OpenMetadata β†’ Airflow Integration

    # Check Airflow DAGs
    curl http://localhost:8081/api/v1/dags
    # Check pipeline connections
    curl http://localhost:8081/health
  3. End-to-End Data Pipeline

    • Visit Airflow UI: http://localhost:8081
    • Enable the parliament_data_pipeline DAG
    • Trigger a manual run
    • Monitor execution in real-time

πŸ”§ Configuration Details

Environment Variables

Key configuration options in .env:

# Database Configuration
PARLIAMENT_POSTGRES_PASSWORD=parliament_pass
OPENMETADATA_POSTGRES_PASSWORD=openmetadata_pass
AIRFLOW_POSTGRES_PASSWORD=airflow_pass

# Service URLs
OPENMETADATA_HOST=http://openmetadata-server:8585
AIRFLOW_HOST=http://airflow-webserver:8080

# Resource Limits
JAVA_OPTS_OPENMETADATA=-Xms1024m -Xmx2048m
OPENSEARCH_JAVA_OPTS=-Xms1024m -Xmx1024m

Service Configuration

Parliament Django Application

  • Database: PostgreSQL with custom schemas
  • Cache: Redis for session and page caching
  • Search: Solr for full-text search
  • Settings: parliament/docker_settings.py

OpenMetadata

  • Database: Dedicated PostgreSQL instance
  • Search: OpenSearch for metadata indexing
  • Configuration: Environment variables + automated setup
  • Features: Metadata discovery, lineage tracking, data quality

Apache Airflow

  • Executor: CeleryExecutor with Redis broker
  • Database: Dedicated PostgreSQL instance
  • DAGs: Located in ./dags/ directory
  • Connections: Pre-configured for Parliament and OpenMetadata

πŸ“Š Monitoring and Health Checks

Built-in Health Monitoring

# Service health status
docker-compose ps

# Service logs
docker-compose logs -f parliament-app
docker-compose logs -f openmetadata-server
docker-compose logs -f airflow-scheduler

# Resource usage
docker stats

Custom Health Check Scripts

# Comprehensive health check
python3 scripts/verify-services.py

# Data flow verification
python3 scripts/verify-data-flow.py

# Generate health report
python3 scripts/verify-services.py > health_report.json

Health Check Endpoints

πŸ› οΈ Management Commands

Service Management

# Start all services
docker-compose up -d

# Stop all services
docker-compose down

# Restart specific service
docker-compose restart parliament-app

# View service logs
docker-compose logs -f airflow-scheduler

# Scale services (if supported)
docker-compose up -d --scale airflow-worker=2

Database Management

# Access Parliament database
docker-compose exec parliament-postgres psql -U parliament -d parliament

# Access OpenMetadata database
docker-compose exec openmetadata-postgres psql -U openmetadata -d openmetadata_db

# Run Django migrations
docker-compose exec parliament-app python manage.py migrate

# Create Django superuser
docker-compose exec parliament-app python manage.py createsuperuser

Data Management

# Backup Parliament database
docker-compose exec parliament-postgres pg_dump -U parliament parliament > parliament_backup.sql

# Import data
docker-compose exec -T parliament-postgres psql -U parliament -d parliament < parliament_backup.sql

# Reset all data (DESTRUCTIVE)
docker-compose down -v
docker-compose up -d

πŸ” Troubleshooting

Common Issues

Services Won't Start

# Check Docker daemon
sudo systemctl status docker

# Check available resources
docker system df
free -h

# Clean up old containers
docker system prune -a

Database Connection Issues

# Check database status
docker-compose ps | grep postgres

# Check database logs
docker-compose logs openmetadata-postgres

# Test connection manually
docker-compose exec parliament-postgres pg_isready -U parliament

Performance Issues

# Check resource usage
docker stats

# Increase memory limits in docker-compose.yml
# Adjust JVM settings in .env file

# Check disk space
df -h
docker system df

Network Connectivity

# Check Docker networks
docker network ls
docker network inspect openparliament_parliament_network

# Test service connectivity
docker-compose exec parliament-app curl http://openmetadata-server:8585/api/v1/system/version

Debug Mode

# Enable debug logging
export DJANGO_DEBUG=true

# Restart with verbose logging
docker-compose down
docker-compose up -d --force-recreate

# Check detailed logs
docker-compose logs --tail=100 -f

πŸ“ˆ Performance Optimization

Resource Allocation

Adjust in docker-compose.yml:

services:
  openmetadata-server:
    deploy:
      resources:
        limits:
          memory: 4G
        reservations:
          memory: 2G

Database Optimization

# Parliament PostgreSQL tuning
docker-compose exec parliament-postgres psql -U parliament -d parliament -c "
  ALTER SYSTEM SET shared_buffers = '256MB';
  ALTER SYSTEM SET effective_cache_size = '1GB';
  SELECT pg_reload_conf();
"

Caching Configuration

# In parliament/docker_settings.py
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.redis.RedisCache',
        'LOCATION': 'redis://redis:6379/1',
        'OPTIONS': {
            'CLIENT_CLASS': 'django_redis.client.DefaultClient',
        },
        'KEY_PREFIX': 'parliament',
        'TIMEOUT': 3600,  # 1 hour
    }
}

πŸ”’ Security Considerations

Development vs Production

Development Mode (current setup):

  • No authentication for OpenMetadata
  • Simple passwords
  • Debug mode enabled
  • All services exposed

Production Recommendations:

  • Enable OpenMetadata authentication
  • Use strong, unique passwords
  • Enable HTTPS/TLS
  • Implement network security
  • Regular security updates

Secure Production Setup

# Generate strong passwords
export POSTGRES_PASSWORD=$(openssl rand -base64 32)

# Enable HTTPS
# Configure nginx proxy with SSL certificates

# Restrict network access
# Configure firewall rules

πŸ“š Development Guide

Adding New DAGs

# Create new DAG in dags/
# Example: dags/my_custom_pipeline.py

from airflow import DAG
from datetime import datetime

dag = DAG(
    'my_custom_pipeline',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    tags=['custom', 'parliament']
)

# Add tasks...

Extending OpenMetadata Integration

# Custom metadata ingestion
# Add to parliament/integrations/openmetadata/

from metadata.generated.schema.entity.services.connections.database.postgresConnection import PostgresConnection
from metadata.ingestion.ometa.ometa_api import OpenMetadata

Custom Health Checks

# Add to scripts/verify-services.py
def check_custom_service(self):
    # Custom health check logic
    pass

πŸ“ž Support and Documentation

Getting Help

  1. Check Logs: Always start with docker-compose logs [service]
  2. Run Health Checks: Use the provided verification scripts
  3. Check Resources: Ensure adequate CPU/RAM/Disk
  4. Review Configuration: Verify .env and Docker Compose settings

Documentation

Useful Commands Reference

# Quick status check
docker-compose ps && python3 scripts/verify-services.py

# Full deployment
./scripts/deploy-infrastructure.sh

# Emergency stop
docker-compose down --remove-orphans

# Clean restart
docker-compose down && docker-compose up -d

# Check data flows
python3 scripts/verify-data-flow.py

πŸŽ‰ Success Metrics

Your deployment is successful when:

βœ… All services show "healthy" status
βœ… Parliament app loads at http://localhost:8000
βœ… OpenMetadata discovers Parliament database tables
βœ… Airflow DAGs execute without errors
βœ… Data flow verification script passes all tests
βœ… Health check script reports 100% success rate

Congratulations! You now have a complete, production-ready OpenParliament infrastructure with OpenMetadata and Airflow integration! πŸš€