|
| 1 | +<!-- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# Envoy Rate Limiter on GKE (Terraform) |
| 21 | +This directory contains a production-ready Terraform module to deploy a scalable **Envoy Rate Limit Service** on Google Kubernetes Engine (GKE) Autopilot. |
| 22 | + |
| 23 | +## Overview |
| 24 | +Apache Beam pipelines often process data at massive scale, which can easily overwhelm external APIs (e.g., Databases, LLM Inference endpoints, SaaS APIs). |
| 25 | + |
| 26 | +This Terraform module deploys a **centralized Rate Limit Service (RLS)** using Envoy. Beam workers can query this service to coordinate global quotas across thousands of distributed workers, ensuring you stay within safe API limits without hitting `429 Too Many Requests` errors. |
| 27 | + |
| 28 | +Example Beam Pipelines using it: |
| 29 | +* [Simple DoFn RateLimiter](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/rate_limiter_simple.py) |
| 30 | +* [Vertex AI RateLimiter](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/rate_limiter_vertex_ai.py) |
| 31 | + |
| 32 | +## Architectures: |
| 33 | +- **GKE Autopilot**: Fully managed, serverless Kubernetes environment. |
| 34 | + - **Private Cluster**: Nodes have internal IPs only. |
| 35 | + - **Cloud NAT (Prerequisite)**: Allows private nodes to pull Docker images. |
| 36 | +- **Envoy Rate Limit Service**: A stateless Go/gRPC service that handles rate limit logic. |
| 37 | +- **Redis**: Stores the rate limit counters. |
| 38 | +- **StatsD Exporter**: Sidecar container that converts StatsD metrics to Prometheus format, exposed on port `9102`. |
| 39 | +- **Internal Load Balancer**: A Google Cloud TCP Load Balancer exposing the Rate Limit service internally within the VPC. |
| 40 | + |
| 41 | +## Prerequisites: |
| 42 | +### Following items need to be setup for Envoy Rate Limiter deployment on GCP: |
| 43 | +1. [GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects) |
| 44 | + |
| 45 | +2. [Tools Installed](https://cloud.google.com/sdk/docs/install): |
| 46 | + - [Terraform](https://www.terraform.io/downloads.html) >= 1.0 |
| 47 | + - [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) (`gcloud`) |
| 48 | + - [kubectl](https://kubernetes.io/docs/tasks/tools/) |
| 49 | + |
| 50 | +3. APIs Enabled: |
| 51 | + ```bash |
| 52 | + gcloud services enable container.googleapis.com compute.googleapis.com |
| 53 | + ``` |
| 54 | + |
| 55 | +4. **Network Configuration**: |
| 56 | + - **Cloud NAT**: Must exist in the region to allow Private Nodes to pull images and reach external APIs. Follow [this](https://docs.cloud.google.com/nat/docs/gke-example#create-nat) for more details. |
| 57 | + **Helper Command** (if you need to create one): |
| 58 | + ```bash |
| 59 | + gcloud compute routers create nat-router --network <VPC_NAME> --region <REGION> |
| 60 | + gcloud compute routers nats create nat-config \ |
| 61 | + --router=nat-router \ |
| 62 | + --region=<REGION> \ |
| 63 | + --auto-allocated-nat-external-ips \ |
| 64 | + --nat-all-subnet-ip-ranges |
| 65 | + ``` |
| 66 | + - **Validation via Console**: |
| 67 | + 1. Go to **Network Services** > **Cloud NAT** in the Google Cloud Console. |
| 68 | + 2. Verify a NAT Gateway exists for your **Region** and **VPC Network**. |
| 69 | + 3. Ensure it is configured to apply to **Primary and Secondary ranges** (or at least the ranges GKE will use). |
| 70 | + |
| 71 | +# Prepare deployment configuration: |
| 72 | +1. Update the `terraform.tfvars` file to define variables specific to your environment: |
| 73 | + |
| 74 | +* `terraform.tfvars` environment variables: |
| 75 | +``` |
| 76 | +project_id = "my-project-id" # GCP Project ID |
| 77 | +region = "us-central1" # GCP Region for deployment |
| 78 | +cluster_name = "ratelimit-cluster" # Name of the GKE cluster |
| 79 | +deletion_protection = true # Prevent accidental cluster deletion (set "true" for prod) |
| 80 | +control_plane_cidr = "172.16.0.0/28" # CIDR for GKE control plane (must not overlap with subnet) |
| 81 | +ratelimit_replicas = 1 # Initial number of Rate Limit pods |
| 82 | +min_replicas = 1 # Minimum HPA replicas |
| 83 | +max_replicas = 5 # Maximum HPA replicas |
| 84 | +hpa_cpu_target_percentage = 75 # CPU utilization target for HPA (%) |
| 85 | +hpa_memory_target_percentage = 75 # Memory utilization target for HPA (%) |
| 86 | +vpc_name = "default" # Existing VPC name to deploy into |
| 87 | +subnet_name = "default" # Existing Subnet name (required for Internal LB IP) |
| 88 | +ratelimit_image = "envoyproxy/ratelimit:e9ce92cc" # Docker image for Rate Limit service |
| 89 | +redis_image = "redis:6.2-alpine" # Docker image for Redis |
| 90 | +ratelimit_resources = { requests = { cpu = "100m", memory = "128Mi" }, limits = { cpu = "500m", memory = "512Mi" } } |
| 91 | +redis_resources = { requests = { cpu = "250m", memory = "256Mi" }, limits = { cpu = "500m", memory = "512Mi" } } |
| 92 | +``` |
| 93 | +
|
| 94 | +* Custom Rate Limit Configuration (Must override in `terraform.tfvars`): |
| 95 | +``` |
| 96 | +ratelimit_config_yaml = <<EOF |
| 97 | +domain: mongo_cps |
| 98 | +descriptors: |
| 99 | + - key: database |
| 100 | + value: users |
| 101 | + rate_limit: |
| 102 | + unit: second |
| 103 | + requests_per_unit: 500 |
| 104 | +EOF |
| 105 | +``` |
| 106 | +
|
| 107 | +# Deploy Envoy Rate Limiter: |
| 108 | +1. Initialize Terraform to download providers and modules: |
| 109 | +```bash |
| 110 | +terraform init |
| 111 | +``` |
| 112 | + |
| 113 | +2. Plan and apply the changes: |
| 114 | +```bash |
| 115 | +terraform plan -out=tfplan |
| 116 | +terraform apply tfplan |
| 117 | +``` |
| 118 | + |
| 119 | +3. Connect to the service: |
| 120 | +After deployment, get the **Internal** IP address: |
| 121 | +```bash |
| 122 | +terraform output load_balancer_ip |
| 123 | +``` |
| 124 | +The service is accessible **only from within the VPC** (e.g., via Dataflow workers or GCE instances in the same network) at `<INTERNAL_IP>:8081`. |
| 125 | + |
| 126 | +4. **Test with Dataflow Workflow**: |
| 127 | + Verify connectivity and rate limiting logic by running the example Dataflow pipeline. |
| 128 | + |
| 129 | + ```bash |
| 130 | + # Get the Internal Load Balancer IP |
| 131 | + export RLS_IP=$(terraform output -raw load_balancer_ip) |
| 132 | + |
| 133 | + python sdks/python/apache_beam/examples/rate_limiter_simple.py \ |
| 134 | + --runner=DataflowRunner \ |
| 135 | + --project=<YOUR_PROJECT_ID> \ |
| 136 | + --region=<YOUR_REGION> \ |
| 137 | + --temp_location=gs://<YOUR_BUCKET>/temp \ |
| 138 | + --staging_location=gs://<YOUR_BUCKET>/staging \ |
| 139 | + --job_name=ratelimit-test-$(date +%s) \ |
| 140 | + # Point to the Terraform-provisioned Internal IP |
| 141 | + --rls_address=${RLS_IP}:8081 \ |
| 142 | + # REQUIRED: Run workers in the same private subnet |
| 143 | + --subnetwork=regions/<YOUR_REGION>/subnetworks/<YOUR_SUBNET_NAME> \ |
| 144 | + --no_use_public_ips |
| 145 | + ``` |
| 146 | + |
| 147 | + |
| 148 | +# Clean up resources: |
| 149 | +To destroy the cluster and all created resources: |
| 150 | +```bash |
| 151 | +terraform destroy |
| 152 | +``` |
| 153 | +*Note: If `deletion_protection` was enabled, you must set it to `false` in `terraform.tfvars` before destroying.* |
| 154 | + |
| 155 | +# Variables description: |
| 156 | + |
| 157 | +|Variable |Description |Default | |
| 158 | +|-----------------------|:----------------------------------------------------|:--------------------------------| |
| 159 | +|project_id |**Required** Google Cloud Project ID |- | |
| 160 | +|vpc_name |**Required** Existing VPC name to deploy into |- | |
| 161 | +|subnet_name |**Required** Existing Subnet name |- | |
| 162 | +|ratelimit_config_yaml |**Required** Rate Limit configuration content |- | |
| 163 | +|region |GCP Region for deployment |us-central1 | |
| 164 | +|control_plane_cidr |CIDR block for GKE control plane |172.16.0.0/28 | |
| 165 | +|cluster_name |Name of the GKE cluster |ratelimit-cluster | |
| 166 | +|deletion_protection |Prevent accidental cluster deletion |false | |
| 167 | +|ratelimit_replicas |Initial number of Rate Limit pods |1 | |
| 168 | +|min_replicas |Minimum HPA replicas |1 | |
| 169 | +|max_replicas |Maximum HPA replicas |5 | |
| 170 | +|hpa_cpu_target_percentage |CPU utilization target for HPA (%) |75 | |
| 171 | +|hpa_memory_target_percentage |Memory utilization target for HPA (%) |75 | |
| 172 | +|ratelimit_image |Docker image for Rate Limit service |envoyproxy/ratelimit:e9ce92cc | |
| 173 | +|redis_image |Docker image for Redis |redis:6.2-alpine | |
| 174 | +|ratelimit_resources |Resources for Rate Limit service (map) |requests/limits (CPU/Mem) | |
| 175 | +|redis_resources |Resources for Redis container (map) |requests/limits (CPU/Mem) | |
| 176 | + |
0 commit comments