|
| 1 | +--- |
| 2 | +page_title: "Provisioning Databricks on Google Cloud with Private Service Connect" |
| 3 | +--- |
| 4 | + |
| 5 | +# Provisioning Databricks workspaces on GCP with Private Service Connect |
| 6 | + |
| 7 | +Secure a workspace with private connectivity and mitigate data exfiltration risks by [enabling Google Private Service Connect (PSC) on the workspace](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html). This guide assumes that you are already familiar with Hashicorp Terraform and provisioned some of the Google Compute Cloud infrastructure with it. |
| 8 | + |
| 9 | +## Creating a GCP service account for Databricks Provisioning and Authenticate with Databricks account API |
| 10 | + |
| 11 | +To work with Databricks in GCP in an automated way, please create a service account and manually add it in the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). Please refer to [Provisioning Databricks workspaces on GCP](gcp_workspace.md) for details. |
| 12 | + |
| 13 | +## Creating a VPC network |
| 14 | + |
| 15 | +The very first step is VPC creation with necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. |
| 16 | + |
| 17 | +To enable [back-end Private Service Connect (data plane to control plane)](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html#two-private-service-connect-options), configure network with the two back-end VPC endpoints: |
| 18 | +- Back-end VPC endpoint for [Secure cluster connectivity](https://docs.gcp.databricks.com/security/secure-cluster-connectivity.html) relay |
| 19 | +- Back-end VPC endpoint for REST APIs |
| 20 | + |
| 21 | +-> Note If you want to implement the front-end VPC endpoint as well for the connections from users to to the Databricks web application, REST API, and Databricks Connect API over a Virtual Private Cloud (VPC) endpoint, use the transit (bastion) VPC. Once the front-end endpoint is created, use the databricks_mws_private_access_settings resource to control which VPC endpoints can connect to the UI or API of any workspace that attaches this private access settings object. |
| 22 | + |
| 23 | +```hcl |
| 24 | +resource "google_compute_network" "dbx_private_vpc" { |
| 25 | + project = var.google_project |
| 26 | + name = "tf-network-${random_string.suffix.result}" |
| 27 | + auto_create_subnetworks = false |
| 28 | +} |
| 29 | +
|
| 30 | +resource "google_compute_subnetwork" "network-with-private-secondary-ip-ranges" { |
| 31 | + name = "test-dbx-${random_string.suffix.result}" |
| 32 | + ip_cidr_range = "10.0.0.0/16" |
| 33 | + region = "us-central1" |
| 34 | + network = google_compute_network.dbx_private_vpc.id |
| 35 | + secondary_ip_range { |
| 36 | + range_name = "pods" |
| 37 | + ip_cidr_range = "10.1.0.0/16" |
| 38 | + } |
| 39 | + secondary_ip_range { |
| 40 | + range_name = "svc" |
| 41 | + ip_cidr_range = "10.2.0.0/20" |
| 42 | + } |
| 43 | + private_ip_google_access = true |
| 44 | +} |
| 45 | +
|
| 46 | +resource "google_compute_router" "router" { |
| 47 | + name = "my-router-${random_string.suffix.result}" |
| 48 | + region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region |
| 49 | + network = google_compute_network.dbx_private_vpc.id |
| 50 | +} |
| 51 | +
|
| 52 | +resource "google_compute_router_nat" "nat" { |
| 53 | + name = "my-router-nat-${random_string.suffix.result}" |
| 54 | + router = google_compute_router.router.name |
| 55 | + region = google_compute_router.router.region |
| 56 | + nat_ip_allocate_option = "AUTO_ONLY" |
| 57 | + source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES" |
| 58 | +} |
| 59 | +
|
| 60 | +resource "databricks_mws_vpc_endpoint" "backend_rest_vpce" { |
| 61 | + account_id = var.databricks_account_id |
| 62 | + vpc_endpoint_name = "vpce-backend-rest-${random_string.suffix.result}" |
| 63 | + gcp_vpc_endpoint_info { |
| 64 | + project_id = var.google_project |
| 65 | + psc_endpoint_name = var.backend_rest_psce |
| 66 | + endpoint_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region |
| 67 | + } |
| 68 | +} |
| 69 | +
|
| 70 | +resource "databricks_mws_vpc_endpoint" "relay_vpce" { |
| 71 | + account_id = var.databricks_account_id |
| 72 | + vpc_endpoint_name = "vpce-relay-${random_string.suffix.result}" |
| 73 | + gcp_vpc_endpoint_info { |
| 74 | + project_id = var.google_project |
| 75 | + psc_endpoint_name = var.relay_psce |
| 76 | + endpoint_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region |
| 77 | + } |
| 78 | +} |
| 79 | +
|
| 80 | +resource "databricks_mws_networks" "this" { |
| 81 | + provider = databricks.accounts |
| 82 | + account_id = var.databricks_account_id |
| 83 | + network_name = "test-demo-${random_string.suffix.result}" |
| 84 | + gcp_network_info { |
| 85 | + network_project_id = var.google_project |
| 86 | + vpc_id = google_compute_network.dbx_private_vpc.name |
| 87 | + subnet_id = google_compute_subnetwork.network-with-private-secondary-ip-ranges.name |
| 88 | + subnet_region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region |
| 89 | + pod_ip_range_name = "pods" |
| 90 | + service_ip_range_name = "svc" |
| 91 | + } |
| 92 | + vpc_endpoints { |
| 93 | + dataplane_relay = [databricks_mws_vpc_endpoint.relay_vpce.vpc_endpoint_id] |
| 94 | + rest_api = [databricks_mws_vpc_endpoint.backend_rest_vpce.vpc_endpoint_id] |
| 95 | + } |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +## Creating a Databricks Workspace |
| 100 | + |
| 101 | +Once [the VPC](#creating-a-vpc) is set up, you can create Databricks workspace through [databricks_mws_workspaces](../resources/mws_workspaces.md) resource. |
| 102 | + |
| 103 | +For a workspace to support any of the Private Service Connect connectivity scenarios, the workspace must be created with an attached [databricks_mws_private_access_settings](../resources/mws_private_access_settings.md) resource. |
| 104 | + |
| 105 | +Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.accounts` and `provider = databricks.created_workspace`. This is why we specify `databricks_host` and `databricks_token` outputs, which have to be used in the latter modules. |
| 106 | + |
| 107 | +-> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. |
| 108 | + |
| 109 | +```hcl |
| 110 | +resource "databricks_mws_private_access_settings" "pas" { |
| 111 | + account_id = var.databricks_account_id |
| 112 | + private_access_settings_name = "pas-${random_string.suffix.result}" |
| 113 | + region = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region |
| 114 | + public_access_enabled = true |
| 115 | + private_access_level = "ACCOUNT" |
| 116 | +} |
| 117 | +
|
| 118 | +resource "databricks_mws_workspaces" "this" { |
| 119 | + provider = databricks.accounts |
| 120 | + account_id = var.databricks_account_id |
| 121 | + workspace_name = "tf-demo-test-${random_string.suffix.result}" |
| 122 | + location = google_compute_subnetwork.network-with-private-secondary-ip-ranges.region |
| 123 | + cloud_resource_container { |
| 124 | + gcp { |
| 125 | + project_id = var.google_project |
| 126 | + } |
| 127 | + } |
| 128 | +
|
| 129 | + private_service_connect_id = databricks_mws_private_access_settings.pas.private_access_settings_id |
| 130 | + network_id = databricks_mws_networks.this.network_id |
| 131 | + gke_config { |
| 132 | + connectivity_type = "PRIVATE_NODE_PUBLIC_MASTER" |
| 133 | + master_ip_range = "10.3.0.0/28" |
| 134 | + } |
| 135 | +
|
| 136 | + token { |
| 137 | + comment = "Terraform" |
| 138 | + } |
| 139 | +
|
| 140 | + # this makes sure that the NAT is created for outbound traffic before creating the workspace |
| 141 | + depends_on = [google_compute_router_nat.nat] |
| 142 | +} |
| 143 | +
|
| 144 | +output "databricks_host" { |
| 145 | + value = databricks_mws_workspaces.this.workspace_url |
| 146 | +} |
| 147 | +
|
| 148 | +output "databricks_token" { |
| 149 | + value = databricks_mws_workspaces.this.token[0].token_value |
| 150 | + sensitive = true |
| 151 | +} |
| 152 | +``` |
| 153 | + |
| 154 | +### Data resources and Authentication is not configured errors |
| 155 | + |
| 156 | +*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources. |
| 157 | + |
| 158 | +```hcl |
| 159 | +data "databricks_current_user" "me" { |
| 160 | + depends_on = [databricks_mws_workspaces.this] |
| 161 | +} |
| 162 | +``` |
0 commit comments