Skip to content

Commit 9d9892c

Browse files
authored
Added a guide to create a GCP private service connect workspace. (#2091)
1 parent d950dcc commit 9d9892c

File tree

1 file changed

+162
-0
lines changed

1 file changed

+162
-0
lines changed
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
page_title: "Provisioning Databricks on Google Cloud with Private Service Connect"
3+
---
4+
5+
# Provisioning Databricks workspaces on GCP with Private Service Connect
6+
7+
Secure a workspace with private connectivity and mitigate data exfiltration risks by [enabling Google Private Service Connect (PSC) on the workspace](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html). This guide assumes that you are already familiar with Hashicorp Terraform and provisioned some of the Google Compute Cloud infrastructure with it.
8+
9+
## Creating a GCP service account for Databricks Provisioning and Authenticate with Databricks account API
10+
11+
To work with Databricks in GCP in an automated way, please create a service account and manually add it in the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). Please refer to [Provisioning Databricks workspaces on GCP](gcp_workspace.md) for details.
12+
13+
## Creating a VPC network
14+
15+
The very first step is VPC creation with necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource.
16+
17+
To enable [back-end Private Service Connect (data plane to control plane)](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html#two-private-service-connect-options), configure network with the two back-end VPC endpoints:
18+
- Back-end VPC endpoint for [Secure cluster connectivity](https://docs.gcp.databricks.com/security/secure-cluster-connectivity.html) relay
19+
- Back-end VPC endpoint for REST APIs
20+
21+
-> Note If you want to implement the front-end VPC endpoint as well for the connections from users to to the Databricks web application, REST API, and Databricks Connect API over a Virtual Private Cloud (VPC) endpoint, use the transit (bastion) VPC. Once the front-end endpoint is created, use the databricks_mws_private_access_settings resource to control which VPC endpoints can connect to the UI or API of any workspace that attaches this private access settings object.
22+
23+
```hcl
24+
resource "google_compute_network" "dbx_private_vpc" {
25+
project = var.google_project
26+
name = "tf-network-${random_string.suffix.result}"
27+
auto_create_subnetworks = false
28+
}
29+
30+
resource "google_compute_subnetwork" "network-with-private-secondary-ip-ranges" {
31+
name = "test-dbx-${random_string.suffix.result}"
32+
ip_cidr_range = "10.0.0.0/16"
33+
region = "us-central1"
34+
network = google_compute_network.dbx_private_vpc.id
35+
secondary_ip_range {
36+
range_name = "pods"
37+
ip_cidr_range = "10.1.0.0/16"
38+
}
39+
secondary_ip_range {
40+
range_name = "svc"
41+
ip_cidr_range = "10.2.0.0/20"
42+
}
43+
private_ip_google_access = true
44+
}
45+
46+
resource "google_compute_router" "router" {
47+
name = "my-router-${random_string.suffix.result}"
48+
region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
49+
network = google_compute_network.dbx_private_vpc.id
50+
}
51+
52+
resource "google_compute_router_nat" "nat" {
53+
name = "my-router-nat-${random_string.suffix.result}"
54+
router = google_compute_router.router.name
55+
region = google_compute_router.router.region
56+
nat_ip_allocate_option = "AUTO_ONLY"
57+
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
58+
}
59+
60+
resource "databricks_mws_vpc_endpoint" "backend_rest_vpce" {
61+
account_id = var.databricks_account_id
62+
vpc_endpoint_name = "vpce-backend-rest-${random_string.suffix.result}"
63+
gcp_vpc_endpoint_info {
64+
project_id = var.google_project
65+
psc_endpoint_name = var.backend_rest_psce
66+
endpoint_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
67+
}
68+
}
69+
70+
resource "databricks_mws_vpc_endpoint" "relay_vpce" {
71+
account_id = var.databricks_account_id
72+
vpc_endpoint_name = "vpce-relay-${random_string.suffix.result}"
73+
gcp_vpc_endpoint_info {
74+
project_id = var.google_project
75+
psc_endpoint_name = var.relay_psce
76+
endpoint_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
77+
}
78+
}
79+
80+
resource "databricks_mws_networks" "this" {
81+
provider = databricks.accounts
82+
account_id = var.databricks_account_id
83+
network_name = "test-demo-${random_string.suffix.result}"
84+
gcp_network_info {
85+
network_project_id = var.google_project
86+
vpc_id = google_compute_network.dbx_private_vpc.name
87+
subnet_id = google_compute_subnetwork.network-with-private-secondary-ip-ranges.name
88+
subnet_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
89+
pod_ip_range_name = "pods"
90+
service_ip_range_name = "svc"
91+
}
92+
vpc_endpoints {
93+
dataplane_relay = [databricks_mws_vpc_endpoint.relay_vpce.vpc_endpoint_id]
94+
rest_api = [databricks_mws_vpc_endpoint.backend_rest_vpce.vpc_endpoint_id]
95+
}
96+
}
97+
```
98+
99+
## Creating a Databricks Workspace
100+
101+
Once [the VPC](#creating-a-vpc) is set up, you can create Databricks workspace through [databricks_mws_workspaces](../resources/mws_workspaces.md) resource.
102+
103+
For a workspace to support any of the Private Service Connect connectivity scenarios, the workspace must be created with an attached [databricks_mws_private_access_settings](../resources/mws_private_access_settings.md) resource.
104+
105+
Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.accounts` and `provider = databricks.created_workspace`. This is why we specify `databricks_host` and `databricks_token` outputs, which have to be used in the latter modules.
106+
107+
-> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article.
108+
109+
```hcl
110+
resource "databricks_mws_private_access_settings" "pas" {
111+
account_id = var.databricks_account_id
112+
private_access_settings_name = "pas-${random_string.suffix.result}"
113+
region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
114+
public_access_enabled = true
115+
private_access_level = "ACCOUNT"
116+
}
117+
118+
resource "databricks_mws_workspaces" "this" {
119+
provider = databricks.accounts
120+
account_id = var.databricks_account_id
121+
workspace_name = "tf-demo-test-${random_string.suffix.result}"
122+
location = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region
123+
cloud_resource_container {
124+
gcp {
125+
project_id = var.google_project
126+
}
127+
}
128+
129+
private_service_connect_id = databricks_mws_private_access_settings.pas.private_access_settings_id
130+
network_id = databricks_mws_networks.this.network_id
131+
gke_config {
132+
connectivity_type = "PRIVATE_NODE_PUBLIC_MASTER"
133+
master_ip_range = "10.3.0.0/28"
134+
}
135+
136+
token {
137+
comment = "Terraform"
138+
}
139+
140+
# this makes sure that the NAT is created for outbound traffic before creating the workspace
141+
depends_on = [google_compute_router_nat.nat]
142+
}
143+
144+
output "databricks_host" {
145+
value = databricks_mws_workspaces.this.workspace_url
146+
}
147+
148+
output "databricks_token" {
149+
value = databricks_mws_workspaces.this.token[0].token_value
150+
sensitive = true
151+
}
152+
```
153+
154+
### Data resources and Authentication is not configured errors
155+
156+
*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources.
157+
158+
```hcl
159+
data "databricks_current_user" "me" {
160+
depends_on = [databricks_mws_workspaces.this]
161+
}
162+
```

0 commit comments

Comments
 (0)