Skip to content

Commit a176d74

Browse files
committed
Profiles sync first pass
1 parent 723eac5 commit a176d74

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
---
2+
title: Databricks Profiles Sync
3+
plan: unify
4+
---
5+
6+
With Databricks Profiles Sync, you can use Profiles Sync to sync Segment profiles into your Databricks Lakehouse.
7+
8+
<!--
9+
Use Databricks as a warehouse destination and materialized view for Profiles Sync Warehouses
10+
-->
11+
## Getting started
12+
13+
Before starting with the Databricks Profiles Sync destination, note the following prerequisites for setup.
14+
15+
- The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information.
16+
Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog.
17+
18+
- Segment uses the service principal to access your Databricks workspace and associated APIs.
19+
- Use the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles.
20+
21+
- The service principal needs the following setup:
22+
- OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete teh existing secret and create a new one.
23+
- [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include:
24+
- USE CATALOG
25+
- USE SCHEMA
26+
- MODIFY
27+
- SELECT
28+
- CREATE SCHEMA
29+
- CREATE TABLE
30+
- Databricks SQL access [entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level.
31+
- CAN USE [permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync.
32+
33+
- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication.
34+
- A SQL warehouse is required for compute. Segment recommends the following size:
35+
- **Size**: small
36+
- **Type** Serverless otherwise Pro
37+
- **Clusters**: Minimum of 2 - Maximum of 6
38+
39+
- To improve the query performance of the Delta Lake, Segment recommends to create compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}.
40+
41+
- If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. For a better experience, Segment recommends manually starting the warehouse in advance.
42+
43+
44+
## Set up Databricks for Profiles Sync
45+
46+
1. From your Segment app, navigate to **Unify > Profiles Sync**.
47+
2. Click **Add Warehouse**.
48+
3. Select **Databricks** as your warehouse type.
49+
4. Use the following steps to [connect your warehouse](#connect-your-databricks-warehouse).
50+
51+
52+
## Connect your Databricks warehouse
53+
54+
Use the following five steps to connect your Databricks warehouse.
55+
56+
> warning ""
57+
> To configure your warehouse, you'll need read and write permissions.
58+
59+
### Step 1: Name your destination
60+
61+
Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to (???
62+
63+
### Step 2: Enter the Databricks compute resources URL
64+
65+
66+
You'll use the Databricks workspace URL, along with Segment, to access your workspace API.
67+
68+
Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://<workspace-deployment-name>.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use.
69+
70+
### Step 3: Enter a Unity catalog name.
71+
72+
This catalog is the target catalog where Segment lands your schemasand tablestables.
73+
1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use.
74+
2. Select the catalog you've just created.
75+
1. Select the Permissions tab, then click **Grant**
76+
2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`.
77+
3. Click **Grant**.
78+
79+
### Step 4: Add the SQL warehouse details from your Databricks warehouse.
80+
81+
Next, add SQL warehouse details about your compute resource.
82+
- **HTTP Path**: Get connection detials for a SQL warehouse
83+
- **Port**: The port number of your SQL warehouse.
84+
85+
86+
### Step 5: Add the principal service client ID and client secret.
87+
88+
Segment uses the service principal to access your Databricks workspace and associated APIs.
89+
1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles.
90+
2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one.
91+
92+
93+
Once you've configured your warehouse, test the connection and click **Next**.
94+
95+
## Set up selective sync
96+
97+
With selective sync, you can choose exactly which tables you want synced to the Databricks warehouse. Segment syncs materialized view tables as well by default.
98+
99+
Select tables to sync, then click **Next**. Segment creates the warehouse and connects databricks to your Profiles Sync space.
100+
101+
You can view sync status, and the tables you're syncing from the Profiles Sync overview page.
102+
103+
104+
Learn more about [using Selective Sync](/docs/unify/profiles-sync/using-selective-sync) with Profiles Sync.
105+
106+

0 commit comments

Comments
 (0)