|
1 | | -# GKE Standard Cluster and Node Pool |
2 | | - |
3 | | -This example creates a GKE private cluster and Node Pool with beta features. |
4 | | -For a full example see [simple_regional_private_beta](../simple_regional_private_beta/README.md) example. |
5 | | - |
6 | | -<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK --> |
7 | | -## Inputs |
8 | | - |
9 | | -| Name | Description | Type | Default | Required | |
10 | | -|------|-------------|------|---------|:--------:| |
11 | | -| cluster\_name\_suffix | A suffix to append to the default cluster name | `string` | `""` | no | |
12 | | -| dns\_cache | Boolean to enable / disable NodeLocal DNSCache | `bool` | `false` | no | |
13 | | -| gce\_pd\_csi\_driver | (Beta) Whether this cluster should enable the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver. | `bool` | `false` | no | |
14 | | -| ip\_range\_pods | The secondary ip range to use for pods | `any` | n/a | yes | |
15 | | -| ip\_range\_services | The secondary ip range to use for services | `any` | n/a | yes | |
16 | | -| network | The VPC network to host the cluster in | `any` | n/a | yes | |
17 | | -| project\_id | The project ID to host the cluster in | `any` | n/a | yes | |
18 | | -| region | The region to host the cluster in | `any` | n/a | yes | |
19 | | -| service\_account | Service account to associate to the nodes in the cluster | `any` | n/a | yes | |
20 | | -| subnetwork | The subnetwork to host the cluster in | `any` | n/a | yes | |
21 | | - |
22 | | -## Outputs |
23 | | - |
24 | | -| Name | Description | |
25 | | -|------|-------------| |
26 | | -| addons\_config | The configuration for addons supported by GKE Autopilot. | |
27 | | -| ca\_certificate | The cluster ca certificate (base64 encoded) | |
28 | | -| cluster\_name | Cluster name | |
29 | | -| endpoint | The cluster endpoint | |
30 | | -| location | Cluster location | |
31 | | -| master\_version | The master Kubernetes version | |
32 | | -| node\_locations | Cluster node locations | |
33 | | -| project\_id | The project ID the cluster is in | |
34 | | - |
35 | | -<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK --> |
36 | | - |
37 | | -To provision this example, run the following from within this directory: |
38 | | -- `terraform init` to get the plugins |
39 | | -- `terraform plan` to see the infrastructure plan |
40 | | -- `terraform apply` to apply the infrastructure build |
41 | | -- `terraform destroy` to destroy the built infrastructure |
| 1 | +# GKE Inference Gateway Example |
| 2 | + |
| 3 | +This example provisions a GKE Standard cluster and a node pool with H100 GPUs, suitable for deploying and serving Large Language Models (LLMs) using the GKE Inference Gateway. |
| 4 | + |
| 5 | +The cluster is configured with: |
| 6 | +- GKE Gateway API enabled. |
| 7 | +- Managed Prometheus for monitoring. |
| 8 | +- DCGM for GPU monitoring. |
| 9 | +- A dedicated node pool with NVIDIA H100 80GB GPUs. |
| 10 | + |
| 11 | +This Terraform script automates the deployment of all necessary Kubernetes resources, including: |
| 12 | +- Authorization for metrics scraping. |
| 13 | +- A vLLM model server for a Llama3.1 model. |
| 14 | +- GKE Inference Gateway CRDs. |
| 15 | +- GKE Inference Gateway resources (`InferencePool`, `InferenceObjective`, `Gateway`, `HTTPRoute`). |
| 16 | + |
| 17 | +## Usage |
| 18 | + |
| 19 | +1. **Enable APIs** |
| 20 | + |
| 21 | + ```bash |
| 22 | + gcloud services enable container.googleapis.com |
| 23 | + ``` |
| 24 | + |
| 25 | +2. **Set up your environment** |
| 26 | + |
| 27 | + You will need to set the following environment variables. You may also need to create a `terraform.tfvars` file to provide values for the variables in `variables.tf`. |
| 28 | + |
| 29 | + ```bash |
| 30 | + export PROJECT_ID="your-project-id" |
| 31 | + export REGION="us-central1" |
| 32 | + export CLUSTER_NAME="inference-gateway-cluster" |
| 33 | + export HF_TOKEN="your-hugging-face-token" |
| 34 | + ``` |
| 35 | + |
| 36 | +3. **Run Terraform** |
| 37 | + |
| 38 | + The `terraform apply` command will provision the GKE cluster and deploy all the necessary Kubernetes resources. |
| 39 | + |
| 40 | + ```bash |
| 41 | + terraform init |
| 42 | + terraform apply |
| 43 | + ``` |
| 44 | + |
| 45 | +4. **Configure kubectl** |
| 46 | + |
| 47 | + After the apply is complete, configure `kubectl` to communicate with your new cluster. |
| 48 | + |
| 49 | + ```bash |
| 50 | + gcloud container clusters get-credentials $(terraform output -raw cluster_name) --region $(terraform output -raw location) |
| 51 | + ``` |
| 52 | + |
| 53 | +5. **Send an inference request** |
| 54 | + |
| 55 | + Get the Gateway IP address: |
| 56 | + ```bash |
| 57 | + IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') |
| 58 | + PORT=80 |
| 59 | + ``` |
| 60 | + |
| 61 | + Send a request: |
| 62 | + ```bash |
| 63 | + curl -i -X POST http://${IP}:${PORT}/v1/completions \ |
| 64 | + -H "Content-Type: application/json" \ |
| 65 | + -d |
| 66 | + { |
| 67 | + "model": "food-review", |
| 68 | + "prompt": "What is a good recipe for a chicken curry?", |
| 69 | + "max_tokens": 100, |
| 70 | + "temperature": "0.7" |
| 71 | + } |
| 72 | + ``` |
| 73 | + |
| 74 | +## Cleanup |
| 75 | + |
| 76 | +Running `terraform destroy` will deprovision the GKE cluster and all associated Kubernetes resources. |
| 77 | + |
| 78 | +```bash |
| 79 | +terraform destroy |
| 80 | +``` |
0 commit comments