Skip to content

Commit d2db998

Browse files
committed
Create databricks-setup.md
1 parent 9a7b12b commit d2db998

File tree

1 file changed

+157
-0
lines changed

1 file changed

+157
-0
lines changed
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
title: Databricks Setup
3+
beta: true
4+
plan: unify
5+
hidden: true
6+
---
7+
8+
> info "Linked Events is in private beta"
9+
> Linked Events is in private beta, and Segment is actively working on this feature. Some functionality may change before it becomes generally available.
10+
11+
On this page, you'll learn how to connect your Databricks data warehouse to Segment.
12+
13+
> info ""
14+
> At this time, you can only use Databricks with Linked Audiences.
15+
16+
## Set up Databricks credentials
17+
18+
Sign into Databricks with admin permissions to provide Segment's Data Graph with the necessary permissions.
19+
20+
Segment assumes that you already have a workspace that includes the datasets you'd like to use for the Data Graph. Segment recommends setting up a new Service Principal user with only the permissions to access the required catalogs and schemas.
21+
22+
### Step 1: Set up a Service Principal and SQL Warehouse
23+
24+
Segment recommends that you set up a new Service Principal. If you already have a Service Principal you'd like to use, grant it "Can use" permissions for your data warehouse and proceed to [Step 2: Create a catalog for Segment to store checkpoint tables](#step-2-create-a-catalog-for-segment-to-store-checkpoint-tables).
25+
26+
To verify that your Service Principal has "Can use" permission, see the [Confirm Service Principal permissions](#confirm-service-principal-permissions) documentation.
27+
28+
#### Create a new Service Principal User
29+
1. Log into the Databricks UI as an Admin.
30+
2. Click **User Management**.
31+
3. Select the **Service principals** tab.
32+
4. Click **Add Service Principal**.
33+
5. Enter a Service principal name and click **Add**.
34+
6. Select the Service Principal you just created and click **Generate secret**.
35+
7. Save the **Secret** and **Client ID** to a safe place. You'll need these values to connect your Databricks warehouse to Segment.
36+
8. To add the user to the workspace:
37+
1. Navigate to Workspaces and select your Workspace.
38+
2. Select the “Permissions” tab and click **Add Permissions**.
39+
3. Add the newly created Service Principal user and click **Save**.
40+
41+
#### Create a new warehouse
42+
1. Log into your workspace as an Admin in the Databricks UI.
43+
2. Navigate to SQL Warehouses and click **Create SQL Warehouse**.
44+
3. Enter a name for your warehouse, select a cluster size, and click **Create**.
45+
46+
#### Add your Service Principal User to Warehouse User Lists
47+
1. Log into the Databricks UI as an Admin.
48+
2. Navigate to SQL Warehouses.
49+
3. Select your warehouse and click **Permissions**.
50+
4. Add the Service Principal user and grant the user “Can use” access.
51+
5. Click **Add**.
52+
53+
##### Confirm Service Principal permissions
54+
Confirm that the Service Principal user that you're using to connect to Segment has "Can use" permissions for your warehouse.
55+
56+
To confirm that your Service Principal has "Can use" permission:
57+
1. In the Databricks console, navigate to SQL Warehouses and select your warehouse.
58+
2. Navigate to Overview and click **Permissions**.
59+
3. Verify that the Service Principal has "Can use" permission.
60+
61+
### Step 2: Create a catalog for Segment to store checkpoint tables
62+
63+
> warning "Segment recommends creating an empty catalog for Data Graph"
64+
> If you plan to use an existing catalog with Reverse ETL, follow the instructions in the [Update user access for Segment Reverse ETL catalog](#update-user-access-for-segment-reverse-etl-catalog) section.
65+
66+
Segment requires write access to a catalog to create a schema for internal bookkeeping, and to store checkpoint tables for the queries that are executed.
67+
68+
Segment recommends creating an empty catalog for this purpose by running the SQL below. This is also the catalog that you'll be required to specify when setting up your Databricks integration in the Segment app.
69+
70+
```sql
71+
CREATE CATALOG IF NOT EXISTS `SEGMENT_LINKED_PROFILES_DB`;
72+
-- Copy the Client ID by clicking “Generate secret” for the Service Principal user
73+
GRANT USAGE ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
74+
GRANT CREATE ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
75+
GRANT SELECT ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
76+
```
77+
78+
### Step 3: Grant read-only access to the Profiles Sync catalog
79+
80+
Run the SQL below to grant the Data Graph read-only access to the Profiles Sync catalog:
81+
82+
```sql
83+
GRANT USAGE, SELECT, USE SCHEMA ON CATALOG `${profiles_sync_catalog}` TO `${client_id}`;
84+
```
85+
86+
### Step 4: Grant read-only access to additional catalogs for Data Graph
87+
Run the SQL below to grant your Service Principal read-only access to any additional catalogs you want to use for Data Graph:
88+
89+
```sql
90+
-- Run the SQL below for each catalog you want to use for the Segment Data Graph
91+
GRANT USAGE, SELECT, USE SCHEMA ON CATALOG `${catalog}` TO `${client_id}`;
92+
```
93+
94+
### (Optional) Restrict read-only access to schemas
95+
96+
Restrict access to specific schemas by running the following SQL:
97+
98+
```sql
99+
GRANT USAGE ON CATALOG `${catalog}` TO `${client_id}`;
100+
USE CATALOG `${catalog}`;
101+
GRANT USAGE, SELECT ON SCHEMA `${schema_1}` TO `${client_id}`;
102+
GRANT USAGE, SELECT ON SCHEMA `${schema_2}` TO `${client_id}`;
103+
...
104+
105+
```
106+
107+
### (Optional) Restrict read access to tables
108+
Restrict access to specific tables by running the following SQL:
109+
110+
```sql
111+
GRANT USAGE ON CATALOG `${catalog}` TO `${client_id}`;
112+
USE CATALOG `${catalog}`;
113+
GRANT USAGE ON SCHEMA `${schema_1}` TO `${client_id}`;
114+
USE SCHEMA `${schema_1}`;
115+
GRANT SELECT ON TABLE `${table_1}` TO `${client_id}`;
116+
GRANT SELECT ON TABLE `${table_2}` TO `${client_id}`;
117+
...
118+
119+
```
120+
121+
### Step 5: Validate the permissions of your Service Principal user
122+
123+
Sign into the Databricks CLI with your Client ID secret and run the following SQL to verify the Service Principal user has the correct permissions for a given table.
124+
125+
> success ""
126+
> If this command succeeds, you can view the table.
127+
128+
```sql
129+
USE DATABASE ${linked_read_only_database} ;
130+
SHOW SCHEMAS;
131+
SELECT * FROM ${schema}.${table} LIMIT 10;
132+
```
133+
134+
### Step 6: Connect your warehouse to Segment
135+
136+
Segment requires the following settings to connect to your Databricks warehouse. You can find these details in your Databricks workspace by navigating to **SQL Warehouse > Connection details**.
137+
138+
- **Hostname**: The address of your Databricks server
139+
- **Http Path**: The address of your Databricks compute resources
140+
- **Port**: The port used to connect to your Databricks warehouse. The default port is 443, but your port might be different.
141+
- **Catalog**: The catalog you designated in [Step 2: Create a catalog for Segment to store checkpoint tables](#step-2-create-a-catalog-for-segment-to-store-checkpoint-tables)
142+
- **Service principal client ID**: The client ID used to access to your Databricks warehouse
143+
- **OAuth secret**: The OAuth secret used to connect to your Databricks warehouse
144+
145+
After identifying the following settings, continue setting up your Data Graph by following the instructions in [Connect your warehouse to the Data Graph](/docs/unify/linked-profiles/data-graph/#step-2-connect-your-warehouse-to-the-data-graph).
146+
147+
## Additional set up for warehouse permissions
148+
149+
### Update user access for Segment Reverse ETL catalog
150+
Run the following SQL if you run into an error on the Segment app indicating that the user doesn’t have sufficient privileges on an existing `_segment_reverse_etl` schema.
151+
152+
153+
If Segment Reverse ETL has ever run in the catalog you are configuring as the Segment connection catalog, a Segment-managed schema is already created and you need to provide the new Segment user access to the existing schema. Update the Databricks table permissions by running the following SQL:
154+
155+
```sql
156+
GRANT ALL PRIVILEGES ON SCHEMA ${segment_internal_catalog}.__segment_reverse_etl TO `${client_id}`;
157+
```

0 commit comments

Comments
 (0)