Skip to content

Commit 096ef69

Browse files
authored
Added support for databricks_mws_workspaces on GCP (Public Preview) (#1879)
1 parent 5b12bdd commit 096ef69

15 files changed

+517
-47
lines changed

docs/data-sources/aws_assume_role_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
subcategory: "AWS"
2+
subcategory: "Deployment"
33
---
44

55
# databricks_aws_assume_role_policy Data Source

docs/data-sources/aws_bucket_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
subcategory: "AWS"
2+
subcategory: "Deployment"
33
---
44
# databricks_aws_bucket_policy Data Source
55

docs/data-sources/aws_crossaccount_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
subcategory: "AWS"
2+
subcategory: "Deployment"
33
---
44
# databricks_aws_crossaccount_policy Data Source
55

docs/data-sources/mws_workspaces.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
subcategory: "AWS"
2+
subcategory: "Deployment"
33
---
44
# databricks_mws_workspaces Data Source
55

docs/data-sources/zones.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
subcategory: "AWS"
2+
subcategory: "Deployment"
33
---
44
# databricks_zones Data Source
55

docs/guides/gcp-workspace.md

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
---
2+
page_title: "Provisioning Databricks workspaces on GCP"
3+
---
4+
5+
# Provisioning Databricks workspaces on GCP
6+
7+
You can provision multiple Databricks workspaces with Terraform.
8+
9+
## Creating a GCP service account for Databricks Provisioning
10+
11+
This guide assumes that you are already familiar with Hashicorp Terraform and provisioned some of the Google Compute Cloud infrastructure with it. To work with Databricks in GCP in an automated way, please create a service account and manually add it in the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. You can use the following Terraform configuration to create a Service Account for Databricks Provisioning, which can be impersonated by a list of principals defined in delegate_from variable. Service Account would be automatically assigned to the newly created Databricks Workspace Creator custom role
12+
13+
```hcl
14+
variable "prefix" {}
15+
16+
variable "project" {
17+
type = string
18+
default = "<my-project-id>"
19+
}
20+
21+
provider "google" {
22+
project = var.project
23+
}
24+
25+
variable "delegate_from" {
26+
description = "Allow either user:[email protected], group:[email protected] or serviceAccount:[email protected] to impersonate created service account"
27+
type = list(string)
28+
}
29+
30+
resource "google_service_account" "sa2" {
31+
account_id = "${var.prefix}-sa2"
32+
display_name = "Service Account for Databricks Provisioning"
33+
}
34+
35+
output "service_account" {
36+
value = google_service_account.sa2.email
37+
description = "Add this email as a user in the Databricks account console"
38+
}
39+
40+
data "google_iam_policy" "this" {
41+
binding {
42+
role = "roles/iam.serviceAccountTokenCreator"
43+
members = var.delegate_from
44+
}
45+
}
46+
47+
resource "google_service_account_iam_policy" "impersonatable" {
48+
service_account_id = google_service_account.sa2.name
49+
policy_data = data.google_iam_policy.this.policy_data
50+
}
51+
52+
resource "google_project_iam_custom_role" "workspace_creator" {
53+
role_id = "${var.prefix}_workspace_creator"
54+
title = "Databricks Workspace Creator"
55+
permissions = [
56+
"iam.serviceAccounts.getIamPolicy",
57+
"iam.serviceAccounts.setIamPolicy",
58+
"iam.roles.create",
59+
"iam.roles.delete",
60+
"iam.roles.get",
61+
"iam.roles.update",
62+
"resourcemanager.projects.get",
63+
"resourcemanager.projects.getIamPolicy",
64+
"resourcemanager.projects.setIamPolicy",
65+
"serviceusage.services.get",
66+
"serviceusage.services.list",
67+
"serviceusage.services.enable"
68+
]
69+
}
70+
71+
data "google_client_config" "current" {}
72+
73+
output "custom_role_url" {
74+
value = "https://console.cloud.google.com/iam-admin/roles/details/projects%3C${data.google_client_config.current.project}%3Croles%3C${google_project_iam_custom_role.workspace_creator.role_id}"
75+
}
76+
77+
resource "google_project_iam_member" "sa2_can_create_workspaces" {
78+
role = google_project_iam_custom_role.workspace_creator.id
79+
member = "serviceAccount:${google_service_account.sa2.email}"
80+
}
81+
```
82+
83+
After you’ve added Service Account to Databricks Accounts Console, please copy its name into `databricks_google_service_account` variable. If you prefer environment variables - `DATABRICKS_GOOGLE_SERVICE_ACCOUNT` is the one you’ll use instead. Please also copy Account ID into `databricks_account_id` variable.
84+
85+
## Authenticate with Databricks account API
86+
87+
Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). The `gcloud` command is available after installing the SDK. Then run the following commands
88+
89+
* `gcloud auth application-default login` to authorise your user with Google Cloud Platform.
90+
* `terraform init` to load Google and Databricks Terraform providers.
91+
* `terraform apply` to apply the configuration changes. Terraform will use your credential to impersonate the service account specified in `databricks_google_service_account` to call the Databricks account-level API.
92+
93+
Alternatively, if you cannot use impersonation and [Application Default Credentials](https://cloud.google.com/docs/authentication/production) as configured by `gcloud`, consider using the service account key directly by passing it to `google_credentials` parameter (or `GOOGLE_CREDENTIALS` environment variable) to avoid using `gcloud`, impersonation, and ADC altogether. The content of this parameter must be either the path to `.json` file or the full JSON content of the Google service account key.
94+
95+
## Provider initialization
96+
97+
```hcl
98+
variable "databricks_account_id" {}
99+
variable "databricks_google_service_account" {}
100+
variable "google_project" {}
101+
variable "google_region" {}
102+
variable "google_zone" {}
103+
104+
105+
terraform {
106+
required_providers {
107+
databricks = {
108+
source = "databricks/databricks"
109+
}
110+
google = {
111+
source = "hashicorp/google"
112+
version = "4.47.0"
113+
}
114+
}
115+
}
116+
117+
provider "google" {
118+
project = var.google_project
119+
region = var.google_region
120+
zone = var.google_zone
121+
}
122+
123+
// initialize provider in "accounts" mode to provision new workspace
124+
125+
provider "databricks" {
126+
alias = "accounts"
127+
host = "https://accounts.gcp.databricks.com"
128+
google_service_account = var.databricks_google_service_account
129+
account_id = var.databricks_account_id
130+
}
131+
132+
data "google_client_openid_userinfo" "me" {
133+
}
134+
135+
data "google_client_config" "current" {
136+
}
137+
138+
resource "random_string" "suffix" {
139+
special = false
140+
upper = false
141+
length = 6
142+
}
143+
```
144+
145+
## Creating a VPC
146+
147+
The very first step is VPC creation with necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource.
148+
149+
```hcl
150+
resource "google_compute_network" "dbx_private_vpc" {
151+
project = var.google_project
152+
name = "tf-network-${random_string.suffix.result}"
153+
auto_create_subnetworks = false
154+
}
155+
156+
resource "google_compute_subnetwork" "network-with-private-secondary-ip-ranges" {
157+
name = "test-dbx-${random_string.suffix.result}"
158+
ip_cidr_range = "10.0.0.0/16"
159+
region = "us-central1"
160+
network = google_compute_network.dbx_private_vpc.id
161+
secondary_ip_range {
162+
range_name = "pods"
163+
ip_cidr_range = "10.1.0.0/16"
164+
}
165+
secondary_ip_range {
166+
range_name = "svc"
167+
ip_cidr_range = "10.2.0.0/20"
168+
}
169+
private_ip_google_access = true
170+
}
171+
172+
resource "google_compute_router" "router" {
173+
name = "my-router-${random_string.suffix.result}"
174+
region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
175+
network = google_compute_network.dbx_private_vpc.id
176+
}
177+
178+
resource "google_compute_router_nat" "nat" {
179+
name = "my-router-nat-${random_string.suffix.result}"
180+
router = google_compute_router.router.name
181+
region = google_compute_router.router.region
182+
nat_ip_allocate_option = "AUTO_ONLY"
183+
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
184+
}
185+
186+
resource "databricks_mws_networks" "this" {
187+
provider = databricks.accounts
188+
account_id = var.databricks_account_id
189+
network_name = "test-demo-${random_string.suffix.result}"
190+
gcp_network_info {
191+
network_project_id = var.google_project
192+
vpc_id = google_compute_network.dbx_private_vpc.name
193+
subnet_id = google_compute_subnetwork.network-with-private-secondary-ip-ranges.name
194+
subnet_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
195+
pod_ip_range_name = "pods"
196+
service_ip_range_name = "svc"
197+
}
198+
}
199+
```
200+
201+
## Creating a Databricks Workspace
202+
203+
Once [the VPC](#Creating a VPC) is set up, you can create Databricks workspace through [databricks_mws_workspaces](../resources/mws_workspaces.md) resource.
204+
205+
Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.accounts` and `provider = databricks.created_workspace`. This is why we specify `databricks_host` and `databricks_token` outputs, which have to be used in the latter modules.
206+
207+
-> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article.
208+
209+
```hcl
210+
resource "databricks_mws_workspaces" "this" {
211+
provider = databricks.accounts
212+
account_id = var.databricks_account_id
213+
workspace_name = "tf-demo-test-${random_string.suffix.result}"
214+
location = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
215+
cloud_resource_container {
216+
gcp {
217+
project_id = var.google_project
218+
}
219+
}
220+
221+
network_id = databricks_mws_networks.this.network_id
222+
gke_config {
223+
connectivity_type = "PRIVATE_NODE_PUBLIC_MASTER"
224+
master_ip_range = "10.3.0.0/28"
225+
}
226+
227+
token {
228+
comment = "Terraform"
229+
}
230+
231+
# this makes sure that the NAT is created for outbound traffic before creating the workspace
232+
depends_on = [google_compute_router_nat.nat]
233+
}
234+
235+
output "databricks_host" {
236+
value = databricks_mws_workspaces.this.workspace_url
237+
}
238+
239+
output "databricks_token" {
240+
value = databricks_mws_workspaces.this.token[0].token_value
241+
sensitive = true
242+
}
243+
```
244+
245+
### Data resources and Authentication is not configured errors
246+
247+
*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `authentication is not configured for provider` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources.
248+
249+
```hcl
250+
data "databricks_current_user" "me" {
251+
depends_on = [databricks_mws_workspaces.this]
252+
}
253+
```
254+
255+
## Provider configuration
256+
257+
In [the next step](workspace-management.md), please use the following configuration for the provider:
258+
259+
```hcl
260+
provider "databricks" {
261+
host = module.dbx_gcp.workspace_url
262+
token = module.dbx_gcp.token_value
263+
}
264+
```
265+
266+
We assume that you have a terraform module in your project that creats a workspace (using [Databricks Workspace](#creating-a-databricks-workspace) section) and you named it as `dbx_gcp` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token during workspace creation to authenticate to the created workspace.

0 commit comments

Comments
 (0)