Skip to content

Commit 8986411

Browse files
committed
GCP Infrastructure manager terraform for elastic-agent (elastic#3776)
### Summary of your changes Replaces deprecated GCP Deployment Manager with modern Infrastructure Manager (Terraform) for deploying Elastic Agent CSPM integration. Provides identical resources with improved tooling and user experience. #### New Directory: deploy/infrastructure-manager/gcp-elastic-agent/ Files Added: main.tf - Main infrastructure configuration (compute instance, network, service account, IAM bindings) variables.tf - Input variable definitions outputs.tf - Deployment outputs service_account.tf - Standalone service account deployment for agentless mode terraform.tfvars.example - Example configuration for main deployment service_account.tfvars.example - Example configuration for SA-only deployment README.md - Comprehensive deployment guide #### Resources Created Identical to Deployment Manager implementation: Compute instance (Ubuntu, n2-standard-4, 32GB disk) with Elastic Agent pre-installed Service account with roles/cloudasset.viewer and roles/browser VPC network with auto-created subnets IAM bindings (project or organization scope) Optional SSH firewall rule #### Compatibility The new deployment script `infrastructure-manager/deploy.sh` is compatible with kibana deployment command of the form: ```bash gcloud config set project elastic-security-test && \ FLEET_URL=https://a6f784d2fb4d48bea7724fbe41ef17d3.fleet.us-central1.gcp.qa.elastic.cloud:443 \ ENROLLMENT_TOKEN=<REDUCTED> \ STACK_VERSION=9.2.3 \ ./deploy.sh ``` ### Related Issues - Resolves: elastic#3132 (cherry picked from commit fdf76cc)
1 parent c8fd62d commit 8986411

File tree

16 files changed

+832
-0
lines changed

16 files changed

+832
-0
lines changed
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
## Elastic Agent Infrastructure Manager (Terraform)
2+
3+
Deploy Elastic Agent for CIS GCP integration using GCP Infrastructure Manager. Creates a compute instance with Elastic Agent pre-installed and configured with necessary permissions.
4+
5+
### Prerequisites
6+
7+
1. Elastic Stack with Fleet Server deployed
8+
2. GCP project with required permissions (see [Required Permissions](#required-permissions))
9+
3. Fleet URL and enrollment token from Kibana
10+
11+
### Quick Deploy
12+
13+
#### Option 1: Cloud Shell (Recommended)
14+
15+
[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/elastic/cloudbeat.git&cloudshell_git_branch=main&cloudshell_workspace=deploy/infrastructure-manager/gcp-elastic-agent&show=terminal&ephemeral=true)
16+
17+
```bash
18+
# Set required configuration
19+
export FLEET_URL="<YOUR_FLEET_URL>"
20+
export ENROLLMENT_TOKEN="<YOUR_TOKEN>"
21+
export STACK_VERSION="<YOUR_AGENT_VERSION>"
22+
23+
# Optional: Set these to override defaults
24+
# export ORG_ID="<YOUR_ORG_ID>" # For org-level monitoring
25+
# export DEPLOYMENT_NAME="elastic-agent-deployment" # Default: elastic-agent-deployment
26+
# export ZONE="us-central1-a" # Default: us-central1-a
27+
# export ELASTIC_ARTIFACT_SERVER="<CUSTOM_SERVER_URL>" # Default: https://artifacts.elastic.co/downloads/beats/elastic-agent
28+
29+
# Deploy using the deploy script
30+
./deploy.sh
31+
```
32+
33+
#### Option 2: GCP Console
34+
35+
1. Go to [Infrastructure Manager Console](https://console.cloud.google.com/infra-manager/deployments/create)
36+
2. Configure:
37+
- **Source**: Git repository
38+
- **Repository URL**: `https://github.com/elastic/cloudbeat.git`
39+
- **Branch**: `main`
40+
- **Directory**: `deploy/infrastructure-manager/gcp-elastic-agent`
41+
- **Location**: `us-central1`
42+
3. Add input variables (see table below)
43+
4. Click **Create**
44+
45+
### Input Variables
46+
47+
| Variable | Required | Default | Description |
48+
|----------|----------|---------|-------------|
49+
| `project_id` | Yes | - | GCP Project ID |
50+
| `fleet_url` | Yes | - | Fleet Server URL |
51+
| `enrollment_token` | Yes | - | Enrollment token (sensitive) |
52+
| `elastic_agent_version` | Yes | - | Agent version (e.g., `8.15.0`) |
53+
| `elastic_artifact_server` | No | `https://artifacts.elastic.co/downloads/beats/elastic-agent` | Artifact server URL for downloading Elastic Agent |
54+
| `zone` | No | `us-central1-a` | GCP zone |
55+
| `scope` | No | `projects` | `projects` or `organizations` |
56+
| `parent_id` | Yes | - | Project ID or Organization ID |
57+
| `startup_validation_enabled` | No | `true` | Enable validation of startup script completion |
58+
| `startup_timeout_seconds` | No | `600` | Maximum time to wait for startup (seconds) |
59+
60+
### Resources Created
61+
62+
- Compute instance (Ubuntu, n2-standard-4, 32GB disk)
63+
- Service account with `cloudasset.viewer` and `browser` roles
64+
- VPC network with auto-created subnets
65+
- IAM bindings (project or organization level)
66+
67+
### Startup Validation
68+
69+
By default, Terraform waits for the startup script to complete and validates success:
70+
- **Enabled**: Deployment fails if agent installation fails
71+
- **Timeout**: 5 minutes (configurable via `startup_timeout_seconds`)
72+
- **Requires**: `gcloud` CLI installed where Terraform runs
73+
74+
**Disable validation** (for testing or debugging):
75+
```bash
76+
# Via environment variable (for deploy.sh)
77+
export STARTUP_VALIDATION_ENABLED=false
78+
./deploy.sh
79+
80+
# Or pass to gcloud directly
81+
gcloud infra-manager deployments apply ${DEPLOYMENT_NAME} \
82+
--location=${LOCATION} \
83+
--input-values="...,startup_validation_enabled=false"
84+
```
85+
86+
**Guest Attributes Written**:
87+
88+
The startup script writes these attributes for monitoring:
89+
- `elastic-agent/startup-status`: `"in-progress"`, `"success"`, or `"failed"`
90+
- `elastic-agent/startup-error`: Error message (only when failed)
91+
- `elastic-agent/startup-timestamp`: Completion timestamp (UTC)
92+
93+
Query manually:
94+
```bash
95+
gcloud compute instances get-guest-attributes ${INSTANCE_NAME} \
96+
--zone ${ZONE} \
97+
--query-path=elastic-agent/
98+
```
99+
100+
### Management
101+
102+
**View deployment:**
103+
```bash
104+
gcloud infra-manager deployments describe ${DEPLOYMENT_NAME} --location=${LOCATION}
105+
```
106+
107+
**Delete deployment:**
108+
```bash
109+
gcloud infra-manager deployments delete ${DEPLOYMENT_NAME} --location=${LOCATION}
110+
```
111+
112+
### Troubleshooting
113+
114+
**Check deployment status:**
115+
```bash
116+
# The instance name is based on the deployment name with a random suffix
117+
# Format: elastic-agent-vm-<random-suffix>
118+
# Example: elastic-agent-vm-0bc08b82
119+
120+
# Check startup script status via guest attributes
121+
gcloud compute instances get-guest-attributes elastic-agent-vm-<suffix> \
122+
--zone ${ZONE} \
123+
--query-path=elastic-agent/startup-status
124+
125+
# Expected values:
126+
# - "in-progress": Installation is running
127+
# - "success": Installation completed successfully
128+
# - "failed": Installation failed (check logs below)
129+
130+
# To find your instance name:
131+
gcloud compute instances list --filter="name~^elastic-agent-vm-"
132+
```
133+
134+
**Check agent logs (without SSH):**
135+
```bash
136+
# View serial console output (includes startup script execution)
137+
gcloud compute instances get-serial-port-output ${INSTANCE_NAME} --zone ${ZONE}
138+
139+
# Filter for elastic-agent specific logs
140+
gcloud compute instances get-serial-port-output ${INSTANCE_NAME} --zone ${ZONE} \
141+
| grep elastic-agent-setup
142+
```
143+
144+
**Check agent logs (with SSH):**
145+
```bash
146+
gcloud compute ssh ${INSTANCE_NAME} --zone ${ZONE}
147+
sudo journalctl -u google-startup-scripts.service
148+
```
149+
150+
**Common Issues:**
151+
152+
1. **404 error downloading agent**: Check `ELASTIC_ARTIFACT_SERVER` and `STACK_VERSION` are correct
153+
2. **Guest attributes show "failed"**: Check serial console logs for error details
154+
3. **Guest attributes not available**: Guest attributes are enabled by default and populate during startup
155+
156+
**Console:** [Infrastructure Manager Deployments](https://console.cloud.google.com/infra-manager/deployments)
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# Configure GCP project
5+
PROJECT_ID=$(gcloud config get-value core/project)
6+
SERVICE_ACCOUNT="infra-manager-deployer"
7+
8+
# Ensure prerequisites are configured
9+
"$(dirname "$0")/setup.sh" "${PROJECT_ID}" "${SERVICE_ACCOUNT}"
10+
11+
# Required environment variables (no defaults - must be provided)
12+
# FLEET_URL, ENROLLMENT_TOKEN, STACK_VERSION
13+
14+
# Optional environment variables (defaults are in variables.tf)
15+
# ORG_ID - Set for org-level monitoring
16+
# ZONE - GCP zone (default: us-central1-a)
17+
# DEPLOYMENT_NAME - Deployment name prefix (default: elastic-agent-deployment)
18+
# ELASTIC_ARTIFACT_SERVER - Artifact server URL
19+
20+
# Generate unique suffix for resource names (8 hex characters)
21+
RESOURCE_SUFFIX=$(openssl rand -hex 4)
22+
23+
# Set deployment name with suffix
24+
DEPLOYMENT_NAME="${DEPLOYMENT_NAME:-elastic-agent-deployment}-${RESOURCE_SUFFIX}"
25+
26+
# Determine zone for location extraction
27+
# We need the zone to derive the region (location) - use default if not set
28+
EFFECTIVE_ZONE="${ZONE:-us-central1-a}"
29+
LOCATION="${EFFECTIVE_ZONE%-?}" # Extract region from zone
30+
31+
# Build input values - only include values that are set
32+
# Defaults are defined in variables.tf (single source of truth)
33+
INPUT_VALUES="project_id=${PROJECT_ID}"
34+
INPUT_VALUES="${INPUT_VALUES},resource_suffix=${RESOURCE_SUFFIX}"
35+
36+
# Required values
37+
INPUT_VALUES="${INPUT_VALUES},fleet_url=${FLEET_URL}"
38+
INPUT_VALUES="${INPUT_VALUES},enrollment_token=${ENROLLMENT_TOKEN}"
39+
INPUT_VALUES="${INPUT_VALUES},elastic_agent_version=${STACK_VERSION}"
40+
41+
# Optional values - only add if explicitly set (let TF use its defaults otherwise)
42+
if [ -n "${ZONE}" ]; then
43+
INPUT_VALUES="${INPUT_VALUES},zone=${ZONE}"
44+
fi
45+
46+
if [ -n "${ELASTIC_ARTIFACT_SERVER}" ]; then
47+
# Remove trailing slash if present
48+
ELASTIC_ARTIFACT_SERVER="${ELASTIC_ARTIFACT_SERVER%/}"
49+
INPUT_VALUES="${INPUT_VALUES},elastic_artifact_server=${ELASTIC_ARTIFACT_SERVER}"
50+
fi
51+
52+
# Set scope and parent_id based on ORG_ID
53+
if [ -n "${ORG_ID}" ]; then
54+
INPUT_VALUES="${INPUT_VALUES},scope=organizations"
55+
INPUT_VALUES="${INPUT_VALUES},parent_id=${ORG_ID}"
56+
else
57+
INPUT_VALUES="${INPUT_VALUES},scope=projects"
58+
INPUT_VALUES="${INPUT_VALUES},parent_id=${PROJECT_ID}"
59+
fi
60+
61+
# Deploy from local source (repo already cloned by Cloud Shell)
62+
echo "Starting deployment ${DEPLOYMENT_NAME}..."
63+
gcloud infra-manager deployments apply "${DEPLOYMENT_NAME}" \
64+
--location="${LOCATION}" \
65+
--service-account="projects/${PROJECT_ID}/serviceAccounts/${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" \
66+
--local-source="." \
67+
--input-values="${INPUT_VALUES}"
68+
69+
EXIT_CODE=$?
70+
if [ $EXIT_CODE -ne 0 ]; then
71+
echo ""
72+
echo "Deployment failed with exit code $EXIT_CODE"
73+
echo ""
74+
echo "Common failure reasons:"
75+
echo " - Wrong artifacts server for pre-release artifact (check ELASTIC_ARTIFACT_SERVER for snapshots/pre-releases)"
76+
echo " - Service account permissions missing for ${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com"
77+
echo " - Invalid input values (fleet_url, enrollment_token, etc.)"
78+
echo ""
79+
echo "Useful debugging commands:"
80+
echo " # View deployment status"
81+
echo " gcloud infra-manager deployments describe ${DEPLOYMENT_NAME} --location=${LOCATION}"
82+
echo ""
83+
echo " # Verify service account permissions"
84+
echo " gcloud projects get-iam-policy ${PROJECT_ID} --flatten='bindings[].members' --filter='bindings.members:serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com' --format='table(bindings.role)'"
85+
echo ""
86+
echo " # View Cloud Build logs"
87+
echo " gsutil cat \$(gcloud infra-manager revisions describe \$(gcloud infra-manager deployments describe ${DEPLOYMENT_NAME} --location=${LOCATION} --format='value(latestRevision)') --location=${LOCATION} --format='value(logs)')/*.txt"
88+
echo ""
89+
echo " # View VM startup script logs"
90+
echo " gcloud compute instances get-serial-port-output elastic-agent-vm-${RESOURCE_SUFFIX} --zone=${EFFECTIVE_ZONE} --project=${PROJECT_ID}"
91+
echo ""
92+
exit $EXIT_CODE
93+
fi
94+
95+
echo ""
96+
echo "Deployment successful!"
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
terraform {
2+
required_version = ">= 1.0"
3+
required_providers {
4+
google = {
5+
source = "hashicorp/google"
6+
version = "~> 5.0"
7+
}
8+
null = {
9+
source = "hashicorp/null"
10+
version = "~> 3.0"
11+
}
12+
}
13+
}
14+
15+
provider "google" {
16+
project = var.project_id
17+
}
18+
19+
locals {
20+
# Use suffix from deploy.sh to ensure all resource names stay within GCP limits and allow multiple deployments
21+
resource_suffix = var.resource_suffix
22+
sa_name = "elastic-agent-sa-${local.resource_suffix}"
23+
sa_email = module.service_account.email
24+
network_name = "elastic-agent-net-${local.resource_suffix}"
25+
instance_name = "elastic-agent-vm-${local.resource_suffix}"
26+
}
27+
28+
# Resource suffix for all resource names
29+
variable "resource_suffix" {
30+
description = "Unique suffix for resource names (8 hex characters)"
31+
type = string
32+
}
33+
34+
module "service_account" {
35+
source = "./modules/service_account"
36+
37+
project_id = var.project_id
38+
service_account_name = local.sa_name
39+
scope = var.scope
40+
parent_id = var.parent_id
41+
}
42+
43+
module "compute_instance" {
44+
source = "./modules/compute_instance"
45+
46+
instance_name = local.instance_name
47+
network_name = local.network_name
48+
machine_type = var.machine_type
49+
zone = var.zone
50+
sa_email = local.sa_email
51+
elastic_agent_version = var.elastic_agent_version
52+
elastic_artifact_server = var.elastic_artifact_server
53+
fleet_url = var.fleet_url
54+
enrollment_token = var.enrollment_token
55+
56+
depends_on = [
57+
module.service_account
58+
]
59+
}
60+
61+
module "startup_validation" {
62+
source = "./modules/startup_validation"
63+
64+
enabled = var.startup_validation_enabled
65+
project_id = var.project_id
66+
instance_name = local.instance_name
67+
instance_id = module.compute_instance.id
68+
zone = var.zone
69+
timeout = var.startup_timeout_seconds
70+
71+
depends_on = [
72+
module.compute_instance
73+
]
74+
}

0 commit comments

Comments
 (0)