Skip to content

Commit 87d3594

Browse files
authored
Update databricks-setup.md
1 parent f3f7170 commit 87d3594

File tree

1 file changed

+39
-53
lines changed

1 file changed

+39
-53
lines changed

src/unify/data-graph/setup-guides/databricks-setup.md

Lines changed: 39 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,18 @@ redirect_from:
77
- '/unify/linked-profiles/setup-guides/databricks-setup'
88
---
99

10-
> info "Linked Audiences is in public beta"
11-
> Linked Audiences (with Data Graph, Linked Events) is in public beta, and Segment is actively working on this feature. Some functionality may change before it becomes generally available.
10+
On this page, you'll learn how to connect your Databricks data warehouse to Segment for the [Data Graph](/docs/unify/data-graph/data-graph/).
1211

13-
On this page, you'll learn how to connect your Databricks data warehouse to the Segment Data Graph.
12+
## Databricks credentials
1413

15-
<!-- remove this and go for it! -->
14+
Segment assumes that you already have a workspace that includes the datasets you'd like to use for the Data Graph. Sign in to Databricks with admin permissions to create new resources and provide the Data Graph with the necessary permissions.
1615

17-
## Set up Databricks credentials
16+
## Step 1: Create a new Service Principal user
17+
Segment recommends setting up a new Service Principal user and only giving this user permissions to access the required catalogs and schemas.
1818

19-
Sign in to Databricks with admin permissions to create new resources and provide the Data Graph with the necessary permissions.
19+
If you already have a Service Principal user you'd like to use, grant it "Can use" permissions for your data warehouse and proceed to [step 2](#Create-a-catalog-for-Segment-to-store-checkpoint-tables).
2020

21-
Segment assumes that you already have a workspace that includes the datasets you'd like to use for the Data Graph. Segment recommends setting up a new Service Principal user with only the permissions to access the required catalogs and schemas.
22-
23-
### Step 1: Set up a Service Principal user
24-
25-
Segment recommends that you set up a new Service Principal user. If you already have a Service Principal user you'd like to use, grant it "Can use" permissions for your data warehouse and proceed to [Step 2: Create a catalog for Segment to store checkpoint tables](#step-2-create-a-catalog-for-segment-to-store-checkpoint-tables).
26-
27-
If you want to create a new Service Principal user, complete the following substeps:
28-
29-
#### Substep 1: Create a new Service Principal user
21+
### a) Create a new Service Principal user
3022
1. Log in to the Databricks UI as an Admin.
3123
2. Click **User Management**.
3224
3. Select the **Service principals** tab.
@@ -38,74 +30,69 @@ If you want to create a new Service Principal user, complete the following subst
3830
9. Select the “Permissions” tab and click **Add Permissions**.
3931
10. Add the newly created Service Principal user and click **Save**.
4032

41-
> success ""
42-
> If you already have a warehouse you'd like to use, you can move on to the next substep, [Substep 2: Add your Service Principal user to Warehouse User Lists](#substep-2-add-your-service-principal-user-to-warehouse-user-lists). If you need to create a new warehouse first, see the [Create a new warehouse](#create-a-new-warehouse) before completing the next substep.
43-
44-
#### Substep 2: Add your Service Principal user to Warehouse User Lists
33+
### b) Add your Service Principal user to Warehouse User Lists
4534
1. Log in to the Databricks UI as an Admin.
4635
2. Navigate to SQL Warehouses.
4736
3. Select your warehouse and click **Permissions**.
4837
4. Add the Service Principal user and grant them “Can use” access.
4938
5. Click **Add**.
5039

51-
##### (Optional) Confirm Service Principal permissions
40+
### c) (Optional) Confirm Service Principal permissions
5241
Confirm that the Service Principal user that you're using to connect to Segment has "Can use" permissions for your warehouse.
5342

5443
To confirm that your Service Principal user has "Can use" permission:
5544
1. In the Databricks console, navigate to SQL Warehouses and select your warehouse.
5645
2. Navigate to Overview and click **Permissions**.
5746
3. Verify that the Service Principal user has "Can use" permission.
5847

59-
### Step 2: Create a catalog for Segment to store checkpoint tables
48+
## Step 2: Create a catalog for Segment to store checkpoint tables
49+
**Segment requires write access to this catalog for internal bookkeeping and to store checkpoint tables for the queries that are executed. Therefore, Segment recommends creating a new catalog for this purpose.** This is also the catalog you'll be required to specify when connecting Databricks with the Segment app.
6050

61-
> warning "Segment recommends creating an empty catalog for the Data Graph"
62-
> If you plan to use an existing catalog with Reverse ETL, follow the instructions in the [Update user access for Segment Reverse ETL catalog](#update-user-access-for-segment-reverse-etl-catalog) section.
63-
64-
Segment requires write access to a catalog to create a schema for internal bookkeeping, and to store checkpoint tables for the queries that are executed.
51+
> info ""
52+
> Segment recommends creating a new database for the Data Graph.
53+
> If you choose to use an existing database that has also been used for [Segment Reverse ETL](/docs/connections/reverse-etl/), you must follow the [additional instructions to update user access for the Segment Reverse ETL catalog](#update-user-access-for-segment-reverse-etl-catalog).
6554
66-
Segment recommends creating an empty catalog for this purpose by running the following SQL. This is also the catalog that you'll be required to specify when setting up your Databricks integration in the Segment app.
67-
68-
```sql
55+
```SQL
6956
CREATE CATALOG IF NOT EXISTS `SEGMENT_LINKED_PROFILES_DB`;
7057
-- Copy the Client ID by clicking “Generate secret” for the Service Principal user
7158
GRANT USAGE ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
7259
GRANT CREATE ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
7360
GRANT SELECT ON CATALOG `SEGMENT_LINKED_PROFILES_DB` TO `${client_id}`;
7461
```
7562

76-
### Step 3: Grant read-only access to the Profiles Sync catalog
63+
## Step 3: Grant read-only access to the Profiles Sync catalog
7764

7865
Run the following SQL to grant the Data Graph read-only access to the Profiles Sync catalog:
7966

80-
```sql
67+
```SQL
8168
GRANT USAGE, SELECT, USE SCHEMA ON CATALOG `${profiles_sync_catalog}` TO `${client_id}`;
8269
```
8370

84-
### Step 4: Grant read-only access to additional catalogs for the Data Graph
85-
Run the following SQL to grant your Service Principal user read-only access to any additional catalogs you want to use for the Data Graph:
71+
## Step 4: Grant read-only access to additional catalogs for the Data Graph
72+
Run the following SQL to grant your Service Principal user read-only access to any additional catalogs you want to use for the Data Graph.
8673

87-
```sql
88-
-- Run this command for each catalog you want to use for the Segment Data Graph
74+
```SQL
75+
-- ********** REPEAT THIS COMMAND FOR EACH CATALOG YOU WANT TO USE FOR THE DATA GRAPH **********
8976
GRANT USAGE, SELECT, USE SCHEMA ON CATALOG `${catalog}` TO `${client_id}`;
9077
```
9178

92-
### (Optional) Restrict read-only access to schemas
79+
## (Optional) Step 5: Restrict read-only access
80+
### Restrict read-only access to schemas
9381

9482
Restrict access to specific schemas by running the following SQL:
9583

96-
```sql
84+
```SQL
9785
GRANT USAGE ON CATALOG `${catalog}` TO `${client_id}`;
9886
USE CATALOG `${catalog}`;
9987
GRANT USAGE, SELECT ON SCHEMA `${schema_1}` TO `${client_id}`;
10088
GRANT USAGE, SELECT ON SCHEMA `${schema_2}` TO `${client_id}`;
10189
...
10290

10391
```
104-
105-
### (Optional) Restrict read-only access to tables
92+
### Restrict read-only access to tables
10693
Restrict access to specific tables by running the following SQL:
10794

108-
```sql
95+
```SQL
10996
GRANT USAGE ON CATALOG `${catalog}` TO `${client_id}`;
11097
USE CATALOG `${catalog}`;
11198
GRANT USAGE ON SCHEMA `${schema_1}` TO `${client_id}`;
@@ -116,39 +103,38 @@ GRANT SELECT ON TABLE `${table_2}` TO `${client_id}`;
116103

117104
```
118105

119-
### Step 5: Validate the permissions of your Service Principal user
106+
## Step 6: Validate the permissions of your Service Principal user
120107

121-
Sign in to the [Databricks CLI with your Client ID secret](https://docs.databricks.com/en/dev-tools/cli/authentication.html#oauth-machine-to-machine-m2m-authentication){:target="_blank”} and run the following SQL to verify the Service Principal user has the correct permissions for a given table.
108+
Sign in to the [Databricks CLI with your Client ID secret](https://docs.databricks.com/en/dev-tools/cli/authentication.html#oauth-machine-to-machine-m2m-authentication) and run the following SQL to verify the Service Principal user has the correct permissions for a given table.
122109

123110
> success ""
124111
> If this command succeeds, you can view the table.
125112
126-
```sql
113+
```SQL
127114
USE DATABASE ${linked_read_only_database} ;
128115
SHOW SCHEMAS;
129116
SELECT * FROM ${schema}.${table} LIMIT 10;
130117
```
131118

132-
### Step 6: Connect your warehouse to Segment
133-
134-
Segment requires the following settings to connect to your Databricks warehouse. You can find these details in your Databricks workspace by navigating to **SQL Warehouse > Connection details**.
119+
## Step 7: Connect your warehouse to Segment
120+
To connect your warehouse to the Data Graph:
135121

122+
1. Navigate to **Unify > Data Graph**. This should be a Unify space with Profiles Sync already set up.
123+
2. Click Connect warehouse.
124+
3. Select Databricks as your warehouse type.
125+
4. Enter your warehouse credentials. Segment requires the following settings to connect to your Databricks warehouse. You can find these details in your Databricks workspace by navigating to **SQL Warehouse > Connection details**.
136126
- **Hostname**: The address of your Databricks server
137127
- **Http Path**: The address of your Databricks compute resources
138128
- **Port**: The port used to connect to your Databricks warehouse. The default port is 443, but your port might be different.
139129
- **Catalog**: The catalog you designated in [Step 2: Create a catalog for Segment to store checkpoint tables](#step-2-create-a-catalog-for-segment-to-store-checkpoint-tables)
140130
- **Service principal client ID**: The client ID used to access to your Databricks warehouse
141131
- **OAuth secret**: The OAuth secret used to connect to your Databricks warehouse
142132

143-
After identifying the following settings, continue setting up the Data Graph by following the instructions in [Connect your warehouse to the Data Graph](/docs/unify/data-graph/data-graph/#step-2-connect-your-warehouse-to-the-data-graph).
144-
145-
## Additional set up for warehouse permissions
146-
147-
### Update user access for Segment Reverse ETL catalog
148-
Run the following SQL if you run into an error on the Segment app indicating that the user doesn’t have sufficient privileges on an existing `_segment_reverse_etl` schema.
133+
5. Test your connection, then click Save.
149134

150-
If Segment Reverse ETL has ever run in the catalog you are configuring as the Segment connection catalog, a Segment-managed schema is already created and you need to provide the new Segment user access to the existing schema. Update the Databricks table permissions by running the following SQL:
135+
## Update user access for Segment Reverse ETL catalog
136+
If Segment Reverse ETL has ever run in the catalog you are configuring as the Segment connection catalog, a Segment-managed schema is already created and you need to provide the new Segment user access to the existing catalog. Run the following SQL if you run into an error on the Segment app indicating that the user doesn’t have sufficient privileges on an existing `_segment_reverse_etl` catalog.
151137

152-
```sql
138+
```SQL
153139
GRANT ALL PRIVILEGES ON SCHEMA ${segment_internal_catalog}.__segment_reverse_etl TO `${client_id}`;
154140
```

0 commit comments

Comments
 (0)