Skip to content

Commit 124519f

Browse files
authored
Finish AWS PrivateLink guide (#1131)
Follow-up to #1084
1 parent 08b0ff2 commit 124519f

File tree

8 files changed

+86
-104
lines changed

8 files changed

+86
-104
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Version changelog
22

3+
## 0.5.1
4+
5+
* Added an extended documentation from provisioning AWS PrivateLink workspace ([#1084](https://github.com/databrickslabs/terraform-provider-databricks/pull/1084)).
6+
37
## 0.5.0
48

59
* Added `workspace_url` attribute to the `databricks_current_user` data source ([#1107](https://github.com/databrickslabs/terraform-provider-databricks/pull/1107)).

common/version.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ package common
33
import "context"
44

55
var (
6-
version = "0.5.0"
6+
version = "0.5.1"
77
// ResourceName is resource name without databricks_ prefix
88
ResourceName contextKey = 1
99
// Provider is the current instance of provider
Lines changed: 76 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
page_title: "Enable Backend AWS PrivateLink for Databricks Workspace"
2+
page_title: "Provisioning Databricks on AWS with PrivateLink"
33
---
44

55
# Deploying pre-requisite resources and enabling PrivateLink connections (AWS Preview)
@@ -15,16 +15,16 @@ This guide uses the following variables in configurations:
1515
- `databricks_account_username`: The username an account-level admin uses to log in to [https://accounts.cloud.databricks.com](https://accounts.cloud.databricks.com).
1616
- `databricks_account_password`: The password for `databricks_account_username`.
1717
- `databricks_account_id`: The numeric ID for your Databricks account. When you are logged in, it appears in the bottom left corner of the page.
18-
- `vpc_id` - The ID for the AWS VPC
19-
- `region` - AWS region
20-
- `security_group_id` - Security groups set up for the existing VPC
21-
- `subnet_ids` - Existing subnets being used for the customer managed VPC
18+
- `vpc_id` - The ID for the AWS VPC.
19+
- `region` - AWS region.
20+
- `security_group_id` - Security groups set up for the existing VPC.
21+
- `subnet_ids` - Existing subnets being used for the customer managed VPC.
2222
- `workspace_vpce_service` - Choose the region-specific service endpoint from this table.
2323
- `relay_vpce_service` - Choose the region-specific service from this table.
24-
- `vpce_subnet_cidr` - CIDR range for the subnet chosen for the VPC endpoint
25-
- `tags` - tags for the Private Link backend setup
26-
- `root_bucket_name` - AWS bucket name required for [storage mws resource](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_storage_configurations) reference
27-
- `cross_account_arn` - AWS EC2 role ARN required for [credentials mws resource](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_credentials)
24+
- `vpce_subnet_cidr` - CIDR range for the subnet chosen for the VPC endpoint.
25+
- `tags` - tags for the Private Link backend setup.
26+
- `root_bucket_name` - AWS bucket name required for [databricks_mws_storage_configurations](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_storage_configurations).
27+
- `cross_account_arn` - AWS EC2 role ARN required for [databricks_mws_credentials](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_credentials).
2828

2929
This guide is provided as-is and you can use this guide as the basis for your custom Terraform module.
3030

@@ -44,11 +44,10 @@ Initialize [provider with `mws` alias](https://www.terraform.io/language/provide
4444
terraform {
4545
required_providers {
4646
databricks = {
47-
source = "databrickslabs/databricks"
48-
version = "0.5.0"
47+
source = "databrickslabs/databricks"
4948
}
5049
aws = {
51-
source = "hashicorp/aws"
50+
source = "hashicorp/aws"
5251
version = "3.49.0"
5352
}
5453
}
@@ -58,15 +57,12 @@ provider "aws" {
5857
region = var.region
5958
}
6059
61-
// initialize provider in "MWS" mode for provisioning workspace with AWS PrivateLink
6260
provider "databricks" {
6361
alias = "mws"
6462
host = "https://accounts.cloud.databricks.com"
6563
username = var.databricks_account_username
6664
password = var.databricks_account_password
6765
}
68-
69-
7066
```
7167

7268
Define the required variables
@@ -75,33 +71,25 @@ Define the required variables
7571
variable "databricks_account_id" {}
7672
variable "databricks_account_username" {}
7773
variable "databricks_account_password" {}
74+
variable "root_bucket_name" {}
75+
variable "cross_account_arn" {}
7876
variable "vpc_id" {}
7977
variable "region" {}
8078
variable "security_group_id" {}
81-
82-
// this input variable is of array type
83-
variable "subnet_ids" {
84-
type = list(string)
85-
}
86-
79+
variable "subnet_ids" { type = list(string) }
8780
variable "workspace_vpce_service" {}
8881
variable "relay_vpce_service" {}
8982
variable "vpce_subnet_cidr" {}
90-
91-
variable "private_dns_enabled" { default = false}
92-
variable "tags" { default = {}}
93-
94-
// these resources (bucket and IAM role) are assumed created using your AWS provider and the examples here https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_storage_configurations and https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_credentials, respectively.
95-
variable "root_bucket_name" {}
96-
variable "cross_account_arn" {}
83+
variable "private_dns_enabled" { default = false }
84+
variable "tags" { default = {} }
9785
9886
locals {
9987
prefix = "private-link-ws"
10088
}
10189
```
10290

103-
## Existing Storage Objects
104-
The following object is used in order to reference the storage configuration ID.
91+
## Root bucket
92+
Create new storage configuration with [databricks_mws_storage_configurations](../resources/mws_storage_configurations.md):
10593
```hcl
10694
resource "databricks_mws_storage_configurations" "this" {
10795
provider = databricks.mws
@@ -111,56 +99,53 @@ resource "databricks_mws_storage_configurations" "this" {
11199
}
112100
```
113101

114-
## Existing IAM Role
115-
The following object is used in order to reference the credential configuration ID.
102+
## Cross-account IAM role
103+
Create new cross-account credentials with [databricks_mws_credentials](../resources/mws_credentials.md):
116104
```hcl
117105
resource "databricks_mws_credentials" "this" {
118106
provider = databricks.mws
119107
account_id = var.databricks_account_id
120108
role_arn = var.cross_account_arn
121109
credentials_name = "${local.prefix}-credentials"
122110
}
123-
124111
```
125112

126-
127-
## Configure AWS objects
113+
## Configure networking
128114
The first step is to create the required AWS objects:
129-
- A subnet dedicated to your VPC endpoints
115+
- A subnet dedicated to your VPC endpoints.
130116
- A security group dedicated to your VPC endpoints and satisfying required inbound/outbound TCP/HTTPS traffic rules on ports 443 and 6666, respectively.
131-
- Lastly, creation of the private access settings and workspace.
132117

133118
```hcl
119+
data "aws_vpc" "prod" {
120+
id = var.vpc_id
121+
}
122+
134123
// this subnet houses the data plane VPC endpoints
135124
resource "aws_subnet" "dataplane_vpce" {
136125
vpc_id = var.vpc_id
137126
cidr_block = var.vpce_subnet_cidr
138127
139-
tags = merge(
140-
data.aws_vpc.prod.tags,
141-
{
142-
Name = "${local.prefix}-${data.aws_vpc.prod.id}-pl-vpce"
143-
},
144-
)
128+
tags = merge(data.aws_vpc.prod.tags, {
129+
Name = "${local.prefix}-${data.aws_vpc.prod.id}-pl-vpce"
130+
})
145131
}
146132
147133
resource "aws_route_table" "this" {
148-
vpc_id = var.vpc_id
149-
150-
tags = merge(
151-
data.aws_vpc.prod.tags,
152-
{
153-
Name = "${local.prefix}-${data.aws_vpc.prod.id}-pl-local-route-tbl"
154-
},
155-
)
134+
vpc_id = var.vpc_id
135+
136+
tags = merge(data.aws_vpc.prod.tags, {
137+
Name = "${local.prefix}-${data.aws_vpc.prod.id}-pl-local-route-tbl"
138+
})
156139
}
157140
158141
resource "aws_route_table_association" "dataplane_vpce_rtb" {
159-
subnet_id = aws_subnet.dataplane_vpce.id
160-
route_table_id = aws_route_table.this.id
142+
subnet_id = aws_subnet.dataplane_vpce.id
143+
route_table_id = aws_route_table.this.id
161144
}
162145
```
163146

147+
Define security group for data plane VPC endpoint backend/relay connections:
148+
164149
```hcl
165150
data "aws_subnet" "ws_vpc_subnets" {
166151
for_each = toset(var.subnet_ids)
@@ -170,73 +155,66 @@ data "aws_subnet" "ws_vpc_subnets" {
170155
locals {
171156
vpc_cidr_blocks = [
172157
for subnet in data.aws_subnet.ws_vpc_subnets :
173-
subnet.cidr_block
174-
]
158+
subnet.cidr_block
159+
]
175160
}
176161
177-
// security group for data plane VPC endpoints for backend/relay connections
178162
resource "aws_security_group" "dataplane_vpce" {
179163
name = "Data Plane VPC endpoint security group"
180164
description = "Security group shared with relay and workspace endpoints"
181165
vpc_id = var.vpc_id
182166
183167
ingress {
184-
description = "Inbound rules"
185-
from_port = 443
186-
to_port = 443
187-
protocol = "tcp"
188-
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
168+
description = "Inbound rules"
169+
from_port = 443
170+
to_port = 443
171+
protocol = "tcp"
172+
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
189173
}
190174
191175
ingress {
192-
description = "Inbound rules"
193-
from_port = 6666
194-
to_port = 6666
195-
protocol = "tcp"
196-
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
176+
description = "Inbound rules"
177+
from_port = 6666
178+
to_port = 6666
179+
protocol = "tcp"
180+
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
197181
}
198182
199183
egress {
200-
description = "Outbound rules"
201-
from_port = 443
202-
to_port = 443
203-
protocol = "tcp"
204-
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
184+
description = "Outbound rules"
185+
from_port = 443
186+
to_port = 443
187+
protocol = "tcp"
188+
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
205189
}
206190
207191
egress {
208-
description = "Outbound rules"
209-
from_port = 6666
210-
to_port = 6666
211-
protocol = "tcp"
212-
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
192+
description = "Outbound rules"
193+
from_port = 6666
194+
to_port = 6666
195+
protocol = "tcp"
196+
cidr_blocks = concat([var.vpce_subnet_cidr], local.vpc_cidr_blocks)
213197
}
214198
215-
tags = merge(
216-
data.aws_vpc.prod.tags,
217-
{
218-
Name = "${local.prefix}-${data.aws_vpc.prod.id}-pl-vpce-sg-rules"
219-
},
220-
)
199+
tags = merge(data.aws_vpc.prod.tags, {
200+
Name = "${local.prefix}-${data.aws_vpc.prod.id}-pl-vpce-sg-rules"
201+
})
221202
}
222203
```
223204

224-
```hcl
225-
data "aws_vpc" "prod" {
226-
id = var.vpc_id
227-
}
205+
Run terraform apply twice when configuring PrivateLink: see an [outstanding issue](https://github.com/hashicorp/terraform-provider-aws/issues/7148) for more information.
206+
* Run 1 - comment the `private_dns_enabled` lines.
207+
* Run 2 - uncomment the `private_dns_enabled` lines.
228208

209+
```hcl
229210
resource "aws_vpc_endpoint" "backend_rest" {
230211
vpc_id = var.vpc_id
231212
service_name = var.workspace_vpce_service
232213
vpc_endpoint_type = "Interface"
233214
security_group_ids = [aws_security_group.dataplane_vpce.id]
234215
subnet_ids = [aws_subnet.dataplane_vpce.id]
235-
// run terraform apply twice when configuring PrivateLink - see this outstanding issue for understanding why this is required - https://github.com/hashicorp/terraform-provider-aws/issues/7148
236-
// Run 1 - comment the `private_dns_enabled` line
237-
// Run 2 - uncomment the `private_dns_enabled` line
238216
// private_dns_enabled = var.private_dns_enabled
239-
depends_on = [aws_subnet.dataplane_vpce]
217+
depends_on = [aws_subnet.dataplane_vpce]
240218
}
241219
242220
resource "aws_vpc_endpoint" "relay" {
@@ -245,14 +223,10 @@ resource "aws_vpc_endpoint" "relay" {
245223
vpc_endpoint_type = "Interface"
246224
security_group_ids = [aws_security_group.dataplane_vpce.id]
247225
subnet_ids = [aws_subnet.dataplane_vpce.id]
248-
// run terraform apply twice when configuring PrivateLink - see this outstanding issue for understanding why this is required - https://github.com/hashicorp/terraform-provider-aws/issues/7148
249-
// Run 1 - comment the `private_dns_enabled` line
250-
// Run 2 - uncomment the `private_dns_enabled` line
251226
// private_dns_enabled = var.private_dns_enabled
252-
depends_on = [aws_subnet.dataplane_vpce]
227+
depends_on = [aws_subnet.dataplane_vpce]
253228
}
254229
255-
256230
resource "databricks_mws_vpc_endpoint" "backend_rest_vpce" {
257231
provider = databricks.mws
258232
account_id = var.databricks_account_id
@@ -270,17 +244,11 @@ resource "databricks_mws_vpc_endpoint" "relay" {
270244
region = var.region
271245
depends_on = [aws_vpc_endpoint.relay]
272246
}
273-
274247
```
275248

276-
## Workspace creation
277-
278-
Once the VPC endpoints are created, they can be supplied in the `databricks_mws_networks` resource for workspace creation with AWS PrivateLink. After the terraform apply is run once (see the comment in the aws_vpc_endpoint resource above), run the terraform apply a second time with the line for private_dns_enabled set to true uncommented to set the proper DNS settings for PrivateLink. For understanding the reason that this needs to be applied twice, see this existing [issue](hashicorp/terraform-provider-aws#7148) in the underlying AWS provider.
279-
280-
The credentials ID which is referenced below is one of the attributes which is created as a result of configuring the cross-account IAM role, which Databricks uses to orchestrate EC2 resources. The credentials are created via [databricks_mws_credentials](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_credentials). Similarly, the storage configuration ID is obtained from the [databricks_mws_storage_configurations](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_storage_configurations) resource.
249+
Once the VPC endpoints are created, they can be supplied in the [databricks_mws_networks](../resources/mws_networks.md) resource for workspace creation with AWS PrivateLink. After the `terraform apply` is run once (see the comment in the `aws_vpc_endpoint` resource above), run the terraform apply a second time with the line for `private_dns_enabled` set to true uncommented to set the proper DNS settings for PrivateLink. For understanding the reason that this needs to be applied twice, see this existing [issue](https://github.com/hashicorp/terraform-provider-aws/issues/7148) in the underlying AWS provider.
281250

282251
```hcl
283-
// Inputs are 2 subnets and one security group from existing VPC that will be used for your Databricks workspace
284252
resource "databricks_mws_networks" "this" {
285253
provider = databricks.mws
286254
account_id = var.databricks_account_id
@@ -293,7 +261,13 @@ resource "databricks_mws_networks" "this" {
293261
rest_api = [databricks_mws_vpc_endpoint.backend_rest_vpce.vpc_endpoint_id]
294262
}
295263
}
264+
```
265+
266+
## Configure workspace
296267

268+
The credentials ID which is referenced below is one of the attributes which is created as a result of configuring the cross-account IAM role, which Databricks uses to orchestrate EC2 resources. The credentials are created via [databricks_mws_credentials](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_credentials). Similarly, the storage configuration ID is obtained from the [databricks_mws_storage_configurations](https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/mws_storage_configurations) resource.
269+
270+
```hcl
297271
resource "databricks_mws_private_access_settings" "pas" {
298272
provider = databricks.mws
299273
account_id = var.databricks_account_id
@@ -316,4 +290,3 @@ resource "databricks_mws_workspaces" "this" {
316290
depends_on = [databricks_mws_networks.this]
317291
}
318292
```
319-

docs/resources/mws_networks.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ In addition to all arguments above, the following attributes are exported:
113113
The following resources are used in the same context:
114114

115115
* [Provisioning Databricks on AWS](../guides/aws-workspace.md) guide.
116+
* [Provisioning Databricks on AWS with PrivateLink](../guides/aws-private-link-workspace.md) guide.
116117
* [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](../guides/aws-e2-firewall-hub-and-spoke.md) guide.
117118
* [databricks_mws_vpc_endpoint](mws_vpc_endpoint.md) to register [aws_vpc_endpoint](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_endpoint) resources with Databricks such that they can be used as part of a [databricks_mws_networks](mws_networks.md) configuration.
118119
* [databricks_mws_private_access_settings](mws_private_access_settings.md) to create a [Private Access Setting](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html#step-5-create-a-private-access-settings-configuration-using-the-databricks-account-api) that can be used as part of a [databricks_mws_workspaces](mws_workspaces.md) resource to create a [Databricks Workspace that leverages AWS PrivateLink](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html).

docs/resources/mws_private_access_settings.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ In addition to all arguments above, the following attributes are exported:
6565
The following resources are used in the same context:
6666

6767
* [Provisioning Databricks on AWS](../guides/aws-workspace.md) guide.
68+
* [Provisioning Databricks on AWS with PrivateLink](../guides/aws-private-link-workspace.md) guide.
6869
* [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](../guides/aws-e2-firewall-hub-and-spoke.md) guide.
6970
* [databricks_mws_vpc_endpoint](mws_vpc_endpoint.md) to register [aws_vpc_endpoint](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_endpoint) resources with Databricks such that they can be used as part of a [databricks_mws_networks](mws_networks.md) configuration.
7071
* [databricks_mws_networks](mws_networks.md) to [configure VPC](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html) & subnets for new workspaces within AWS.

docs/resources/mws_storage_configurations.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ In addition to all arguments above, the following attributes are exported:
5454
The following resources are used in the same context:
5555

5656
* [Provisioning Databricks on AWS](../guides/aws-workspace.md) guide.
57+
* [Provisioning Databricks on AWS with PrivateLink](../guides/aws-private-link-workspace.md) guide.
5758
* [databricks_mws_credentials](mws_credentials.md) to configure the cross-account role for creation of new workspaces within AWS.
5859
* [databricks_mws_customer_managed_keys](mws_customer_managed_keys.md) to configure KMS keys for new workspaces within AWS.
5960
* [databricks_mws_log_delivery](mws_log_delivery.md) to configure delivery of [billable usage logs](https://docs.databricks.com/administration-guide/account-settings/billable-usage-delivery.html) and [audit logs](https://docs.databricks.com/administration-guide/account-settings/audit-logs.html).

docs/resources/mws_vpc_endpoint.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ In addition to all arguments above, the following attributes are exported:
152152
The following resources are used in the same context:
153153

154154
* [Provisioning Databricks on AWS](../guides/aws-workspace.md) guide.
155+
* [Provisioning Databricks on AWS with PrivateLink](../guides/aws-private-link-workspace.md) guide.
155156
* [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](../guides/aws-e2-firewall-hub-and-spoke.md) guide.
156157
* [databricks_mws_networks](mws_networks.md) to [configure VPC](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html) & subnets for new workspaces within AWS.
157158
* [databricks_mws_private_access_settings](mws_private_access_settings.md) to create a [Private Access Setting](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html#step-5-create-a-private-access-settings-configuration-using-the-databricks-account-api) that can be used as part of a [databricks_mws_workspaces](mws_workspaces.md) resource to create a [Databricks Workspace that leverages AWS PrivateLink](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html).

docs/resources/mws_workspaces.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ You can reset local DNS caches before provisioning new workspaces with one of th
249249
The following resources are used in the same context:
250250

251251
* [Provisioning Databricks on AWS](../guides/aws-workspace.md) guide.
252+
* [Provisioning Databricks on AWS with PrivateLink](../guides/aws-private-link-workspace.md) guide.
252253
* [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](../guides/aws-e2-firewall-hub-and-spoke.md) guide.
253254
* [databricks_mws_credentials](mws_credentials.md) to configure the cross-account role for creation of new workspaces within AWS.
254255
* [databricks_mws_customer_managed_keys](mws_customer_managed_keys.md) to configure KMS keys for new workspaces within AWS.

0 commit comments

Comments
 (0)