You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/storage/catalog/data-lakes/index.md
+51-17Lines changed: 51 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,13 +82,16 @@ Segment creates a separate EMR cluster to run replays, then destroys it when the
82
82
## Set up Azure Data Lakes
83
83
84
84
> info " "
85
-
> Azure Data Lakes is available in Public Beta.
85
+
> Azure Data Lakes is currently in Public Beta.
86
86
87
87
To set up Azure Data Lakes, create your Azure resources and then enable the Data Lakes destination in the Segment app.
88
88
89
89
### Prerequisites
90
90
91
-
Before you can configure your Azure resources, you must first [create an Azure subscription](https://azure.microsoft.com/en-us/free/){:target="_blank”}, create an account with `Microsoft.Authorization/roleAssignments/write` permissions, and configure the [Azure Command Line Interface (Azure CLI)](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli){:target="_blank”}.
91
+
Before you can configure your Azure resources, you must complete the following prerequisites:
92
+
-[Create an Azure subscription](https://azure.microsoft.com/en-us/free/){:target="_blank”}
93
+
- Create an account with `Microsoft.Authorization/roleAssignments/write` permissions
94
+
- Configure the [Azure Command Line Interface (Azure CLI)](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli){:target="_blank”}
92
95
93
96
### Step 1 - Create an ALDS-enabled storage account
94
97
@@ -149,7 +152,7 @@ Before you can configure your Azure resources, you must first [create an Azure s
149
152
13. Select the **Overview** tab and click the **Restart** button to restart your database. Restarting your database updates the `lower_case_table_name` setting.
150
153
14. Once the server restarts successfully, open your Azure CLI.
151
154
15. Sign into the MySQL server from your command line by entering the following command:
152
-
```sql
155
+
```curl
153
156
mysql --host=/[HOSTNAME] --port=3306 --user=[USERNAME] --password=[PASSWORD]
154
157
```
155
158
16. Run the `CREATE DATABASE` command to create your Hive Metastore:
@@ -192,15 +195,20 @@ Before you can configure your Azure resources, you must first [create an Azure s
192
195
193
196
### Step 5 - Set up a Service Principal
194
197
195
-
1. Open your Azure CLI and create a new service principal using the following commands: <br/>
198
+
1. Open the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
199
+
2. Click **Settings** and select **User settings**.
200
+
3. On the Access tokens page, click **Generate new token**.
201
+
4. Enter a comment for your token, select the lifetime of your ticket, and click **Generate**.
202
+
5. Copy your token, as you'll use this to add your service principal to your workspace.
203
+
6. Open your Azure CLI and create a new service principal using the following commands: <br/>
196
204
```powershell
197
205
az login
198
206
az ad sp create-for-rbac --name <ServicePrincipalName>
199
207
```
200
-
2. In your Azure portal, select the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
201
-
2. On the overview page for your Databricks instance, select **Access control (IAM)**.
202
-
3. Click **Add** and select **Add role assignment**.
203
-
4. On the **Roles** tab, select the `Managed Application Operator` role. Click **Next**.
208
+
7. In your Azure portal, select the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
209
+
8. On the overview page for your Databricks instance, select **Access control (IAM)**.
210
+
9. Click **Add** and select **Add role assignment**.
211
+
10. On the **Roles** tab, select the `Managed Application Operator` role. Click **Next**.
204
212
11. On the **Members** tab, select a **User, group, or service principal**.
205
213
12. Click **Select members**.
206
214
13. Search for and select the Service Principal you created above.
@@ -209,12 +217,38 @@ az ad sp create-for-rbac --name <ServicePrincipalName>
209
217
16. Return to the Azure home page. Select your storage account.
210
218
17. On the overview page for your storage account, select **Access control (IAM)**.
211
219
18. Click **Add** and select **Add role assignment**.
212
-
4. On the **Roles** tab, select the `Storage Blob Data Contributor` role. Click **Next**.
213
-
11. On the **Members** tab, select a **User, group, or service principal**.
214
-
12. Click **Select members**.
215
-
13. Search for and select the Service Principal you created above.
216
-
14. Click **Select**.
217
-
15. Under the **Members** header, verify that you selected your Service Principal. Click **Review + assign**.
220
+
19. On the **Roles** tab, select the `Storage Blob Data Contributor` role. Click **Next**.
221
+
20. On the **Members** tab, select a **User, group, or service principal**.
222
+
21. Click **Select members**.
223
+
22. Search for and select the Service Principal you created above.
224
+
23. Click **Select**.
225
+
24. Under the **Members** header, verify that you selected your Service Principal. Click **Review + assign**.
226
+
25. Open your Key Vault. In the sidebar, select **Secrets**.
227
+
26. Click **Generate/Import**.
228
+
27. On the Create a secret page, select **Manual**. Enter the name `spsecret` for your secret, and enter the name of the secret you created in Databricks in the **Value** field.
229
+
28. From your Azure CLI, call the Databricks SCIM API to add your service principal to your workspace, replacing `<per-workspace-url> `with the URL of your Databricks workspace, `<personal-access-token> `with the access token you created in an earlier step, and `<application-id>` with the client ID of your service principal: <br/>
230
+
```curl
231
+
curl -X POST 'https://<per-workspace-url>/api/2.0/preview/scim/v2/ServicePrincipals' \
29. Open Databricks and navigate to your cluster. Select **Permissions**.
248
+
30. In the permissions menu, grant your service principal **Can Manage** permissions.
249
+
250
+
> warning " "
251
+
> Before continuing, note the Client ID and Client Secret for your Service Principal: you'll need these variables when configuring the Azure Data Lakes destination in the Segment app.
5. Open the **Advanced options** toggle and paste the Spark config you copied above, replacing the variables (`<example_variable>`) with information from your workspace.
259
293
6. Select **Confirm and restart**. On the popup window, select **Confirm**.
260
294
7. Log in to your Azure MySQL database using the following command: <br/>
261
-
```powershell
295
+
```curl
262
296
mysql --host=[HOSTNAME] --port=3306 --user=[USERNAME] --password=[PASSWORD]
263
297
```
264
298
8. Once you've logged in to your MySQL database, run the following commands: <br/>
@@ -303,12 +337,12 @@ After you set up the necessary resources in Azure, the next step is to set up th
303
337
-**Databricks Workspace Resource Group**: The resource group that hosts your Azure Databricks instance. This is visible in Azure on the overview page for your Databricks instance.
304
338
-**Region**: The location of the Azure Storage account you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
305
339
-**Service Principal Client ID**: The Client ID of the Service Principal that you set up in [Step 5 - Set up a Service Principal](#step-5---set-up-a-service-principal).
306
-
-**Service Principal Client Secret**: The Client ID of the Service Principal that you set up in [Step 5 - Set up a Service Principal](#step-5---set-up-a-service-principal).
340
+
-**Service Principal Client Secret**: The Secret for the Service Principal that you set up in [Step 5 - Set up a Service Principal](#step-5---set-up-a-service-principal).
307
341
308
342
309
343
### (Optional) Set up your Azure Data Lake using Terraform
310
344
311
-
Instead of manually configuring your Data Lake, you can create it using the script in the [`terraform-azure-data-lake`](https://github.com/segmentio/terraform-azure-data-lakes) GitHub repository.
345
+
Instead of manually configuring your Data Lake, you can create it using the script in the [`terraform-azure-data-lake`](https://github.com/segmentio/terraform-azure-data-lakes){:target="_blank”} GitHub repository.
0 commit comments