Skip to content

Commit 8ebbc56

Browse files
committed
update to ga release
1 parent 71b2638 commit 8ebbc56

File tree

3 files changed

+564
-75
lines changed

3 files changed

+564
-75
lines changed
Lines changed: 80 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,80 @@
1-
# GKE Standard Cluster and Node Pool
2-
3-
This example creates a GKE private cluster and Node Pool with beta features.
4-
For a full example see [simple_regional_private_beta](../simple_regional_private_beta/README.md) example.
5-
6-
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
7-
## Inputs
8-
9-
| Name | Description | Type | Default | Required |
10-
|------|-------------|------|---------|:--------:|
11-
| cluster\_name\_suffix | A suffix to append to the default cluster name | `string` | `""` | no |
12-
| dns\_cache | Boolean to enable / disable NodeLocal DNSCache | `bool` | `false` | no |
13-
| gce\_pd\_csi\_driver | (Beta) Whether this cluster should enable the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver. | `bool` | `false` | no |
14-
| ip\_range\_pods | The secondary ip range to use for pods | `any` | n/a | yes |
15-
| ip\_range\_services | The secondary ip range to use for services | `any` | n/a | yes |
16-
| network | The VPC network to host the cluster in | `any` | n/a | yes |
17-
| project\_id | The project ID to host the cluster in | `any` | n/a | yes |
18-
| region | The region to host the cluster in | `any` | n/a | yes |
19-
| service\_account | Service account to associate to the nodes in the cluster | `any` | n/a | yes |
20-
| subnetwork | The subnetwork to host the cluster in | `any` | n/a | yes |
21-
22-
## Outputs
23-
24-
| Name | Description |
25-
|------|-------------|
26-
| addons\_config | The configuration for addons supported by GKE Autopilot. |
27-
| ca\_certificate | The cluster ca certificate (base64 encoded) |
28-
| cluster\_name | Cluster name |
29-
| endpoint | The cluster endpoint |
30-
| location | Cluster location |
31-
| master\_version | The master Kubernetes version |
32-
| node\_locations | Cluster node locations |
33-
| project\_id | The project ID the cluster is in |
34-
35-
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
36-
37-
To provision this example, run the following from within this directory:
38-
- `terraform init` to get the plugins
39-
- `terraform plan` to see the infrastructure plan
40-
- `terraform apply` to apply the infrastructure build
41-
- `terraform destroy` to destroy the built infrastructure
1+
# GKE Inference Gateway Example
2+
3+
This example provisions a GKE Standard cluster and a node pool with H100 GPUs, suitable for deploying and serving Large Language Models (LLMs) using the GKE Inference Gateway.
4+
5+
The cluster is configured with:
6+
- GKE Gateway API enabled.
7+
- Managed Prometheus for monitoring.
8+
- DCGM for GPU monitoring.
9+
- A dedicated node pool with NVIDIA H100 80GB GPUs.
10+
11+
This Terraform script automates the deployment of all necessary Kubernetes resources, including:
12+
- Authorization for metrics scraping.
13+
- A vLLM model server for a Llama3.1 model.
14+
- GKE Inference Gateway CRDs.
15+
- GKE Inference Gateway resources (`InferencePool`, `InferenceObjective`, `Gateway`, `HTTPRoute`).
16+
17+
## Usage
18+
19+
1. **Enable APIs**
20+
21+
```bash
22+
gcloud services enable container.googleapis.com
23+
```
24+
25+
2. **Set up your environment**
26+
27+
You will need to set the following environment variables. You may also need to create a `terraform.tfvars` file to provide values for the variables in `variables.tf`.
28+
29+
```bash
30+
export PROJECT_ID="your-project-id"
31+
export REGION="us-central1"
32+
export CLUSTER_NAME="inference-gateway-cluster"
33+
export HF_TOKEN="your-hugging-face-token"
34+
```
35+
36+
3. **Run Terraform**
37+
38+
The `terraform apply` command will provision the GKE cluster and deploy all the necessary Kubernetes resources.
39+
40+
```bash
41+
terraform init
42+
terraform apply
43+
```
44+
45+
4. **Configure kubectl**
46+
47+
After the apply is complete, configure `kubectl` to communicate with your new cluster.
48+
49+
```bash
50+
gcloud container clusters get-credentials $(terraform output -raw cluster_name) --region $(terraform output -raw location)
51+
```
52+
53+
5. **Send an inference request**
54+
55+
Get the Gateway IP address:
56+
```bash
57+
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
58+
PORT=80
59+
```
60+
61+
Send a request:
62+
```bash
63+
curl -i -X POST http://${IP}:${PORT}/v1/completions \
64+
-H "Content-Type: application/json" \
65+
-d
66+
{
67+
"model": "food-review",
68+
"prompt": "What is a good recipe for a chicken curry?",
69+
"max_tokens": 100,
70+
"temperature": "0.7"
71+
}
72+
```
73+
74+
## Cleanup
75+
76+
Running `terraform destroy` will deprovision the GKE cluster and all associated Kubernetes resources.
77+
78+
```bash
79+
terraform destroy
80+
```

0 commit comments

Comments
 (0)