This document describes the architecture of the Vertex AI MLOps Pipeline Demo, which demonstrates enterprise-grade machine learning workflows using Google Cloud Platform services.
┌─────────────────────────────────────────────────────────────────┐
│ User Interface │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Jupyter │ │ Vertex AI │ │ Cloud │ │
│ │ Notebooks │ │ Console │ │ Console │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CI/CD Pipeline │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Azure │ │ GitHub │ │ Cloud │ │
│ │ DevOps │ │ Actions │ │ Build │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Vertex AI Pipeline │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ BigQuery │ │ Dataflow │ │ Dataproc │ │
│ │ Analysis │ │ Processing │ │ Processing │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GCS │ │ VPC │ │ IAM │ │
│ │ Storage │ │ Network │ │ Service │ │
│ │ │ │ │ │ Accounts │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
- Project ID:
prj-gft-vertexai-demo1 - Region:
europe-west2 - Billing Account:
01A2F5-73127B-50AE5B
- VPC:
test-vpc-network - Subnet:
my-subnet-123(10.0.0.0/8) - Purpose: Isolated network for data processing
- Artifact Bucket:
vertex-ai-model-artifacts-bkt - Dataflow Templates:
dataflow-templates-bkt - Dataflow Temp:
dataflow-temp-bkt - Dataflow Artifacts:
dataflow-artifacts-bkt
- Vertex AI Executor:
vertex-ai-executor@prj-gft-vertexai-demo1.iam.gserviceaccount.com - Dataproc:
dataproc@prj-gft-vertexai-demo1.iam.gserviceaccount.com - Dataflow:
dataflow@prj-gft-vertexai-demo1.iam.gserviceaccount.com
- Purpose: Data analysis and record counting
- Dataset:
bigquery-public-data.chicago_taxi_trips.taxi_trips - Outputs: Total records, 0.1% sample size
- Purpose: Apache Beam data processing
- Template:
chicago-taxi-avg-speed-csv.json - Output: Average taxi speeds by time period
- Purpose: Spark batch processing
- Output: Processed taxi data with aggregations
1. Data Source
└── Chicago Taxi Trips (BigQuery Public Dataset)
└── 2. BigQuery Analysis
├── Count total records
└── Calculate 0.1% sample
└── 3. Dataflow Processing
├── Read taxi trip data
├── Calculate average speeds
└── Write results to GCS
└── 4. Dataproc Processing
├── Spark job execution
├── Data aggregation
└── Store processed data
└── 5. Results Aggregation
├── Combine all outputs
└── Generate summary report
- VPC with private subnets
- Firewall rules for service-to-service communication
- Cloud NAT for outbound internet access
- Encryption at rest (GCS, BigQuery)
- Encryption in transit (TLS 1.2+)
- IAM policies with least privilege access
- Dedicated service accounts per service
- Minimal required permissions
- Key rotation policies
- Dataflow auto-scaling based on data volume
- Dataproc cluster scaling
- BigQuery slot allocation
- Machine type selection for compute-intensive tasks
- Memory optimization for large datasets
- Cloud Logging for all services
- Structured logging with correlation IDs
- Log retention policies
- Cloud Monitoring dashboards
- Custom metrics for pipeline performance
- Alerting on failures and performance degradation
- Distributed tracing across pipeline components
- Performance bottleneck identification
- GCS versioning enabled
- BigQuery table snapshots
- Cross-region replication for critical data
- Terraform/Terragrunt for infrastructure as code
- Automated deployment pipelines
- Environment-specific configurations
- Auto-scaling for compute resources
- Spot instances for non-critical workloads
- Resource scheduling for batch jobs
- Lifecycle policies for GCS buckets
- BigQuery partitioning and clustering
- Data archival strategies
- Multi-region deployment
- Advanced ML model training
- Real-time streaming with Pub/Sub
- Advanced monitoring with custom dashboards
- Integration with external data sources
- Migration to newer GCP services
- Adoption of new ML frameworks
- Enhanced security features
- Improved CI/CD practices