Skip to content

Commit 10ad262

Browse files
committed
Added missing context, vale updates [netlify-build]
1 parent 6163d55 commit 10ad262

File tree

1 file changed

+51
-17
lines changed
  • src/connections/storage/catalog/data-lakes

1 file changed

+51
-17
lines changed

src/connections/storage/catalog/data-lakes/index.md

Lines changed: 51 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,16 @@ Segment creates a separate EMR cluster to run replays, then destroys it when the
8282
## Set up Azure Data Lakes
8383

8484
> info " "
85-
> Azure Data Lakes is available in Public Beta.
85+
> Azure Data Lakes is currently in Public Beta.
8686
8787
To set up Azure Data Lakes, create your Azure resources and then enable the Data Lakes destination in the Segment app.
8888

8989
### Prerequisites
9090

91-
Before you can configure your Azure resources, you must first [create an Azure subscription](https://azure.microsoft.com/en-us/free/){:target="_blank”}, create an account with `Microsoft.Authorization/roleAssignments/write` permissions, and configure the [Azure Command Line Interface (Azure CLI)](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli){:target="_blank”}.
91+
Before you can configure your Azure resources, you must complete the following prerequisites:
92+
- [Create an Azure subscription](https://azure.microsoft.com/en-us/free/){:target="_blank”}
93+
- Create an account with `Microsoft.Authorization/roleAssignments/write` permissions
94+
- Configure the [Azure Command Line Interface (Azure CLI)](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli){:target="_blank”}
9295

9396
### Step 1 - Create an ALDS-enabled storage account
9497

@@ -149,7 +152,7 @@ Before you can configure your Azure resources, you must first [create an Azure s
149152
13. Select the **Overview** tab and click the **Restart** button to restart your database. Restarting your database updates the `lower_case_table_name` setting.
150153
14. Once the server restarts successfully, open your Azure CLI.
151154
15. Sign into the MySQL server from your command line by entering the following command:
152-
```sql
155+
```curl
153156
mysql --host=/[HOSTNAME] --port=3306 --user=[USERNAME] --password=[PASSWORD]
154157
```
155158
16. Run the `CREATE DATABASE` command to create your Hive Metastore:
@@ -192,15 +195,20 @@ Before you can configure your Azure resources, you must first [create an Azure s
192195
193196
### Step 5 - Set up a Service Principal
194197

195-
1. Open your Azure CLI and create a new service principal using the following commands: <br/>
198+
1. Open the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
199+
2. Click **Settings** and select **User settings**.
200+
3. On the Access tokens page, click **Generate new token**.
201+
4. Enter a comment for your token, select the lifetime of your ticket, and click **Generate**.
202+
5. Copy your token, as you'll use this to add your service principal to your workspace.
203+
6. Open your Azure CLI and create a new service principal using the following commands: <br/>
196204
``` powershell
197205
az login
198206
az ad sp create-for-rbac --name <ServicePrincipalName>
199207
```
200-
2. In your Azure portal, select the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
201-
2. On the overview page for your Databricks instance, select **Access control (IAM)**.
202-
3. Click **Add** and select **Add role assignment**.
203-
4. On the **Roles** tab, select the `Managed Application Operator` role. Click **Next**.
208+
7. In your Azure portal, select the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
209+
8. On the overview page for your Databricks instance, select **Access control (IAM)**.
210+
9. Click **Add** and select **Add role assignment**.
211+
10. On the **Roles** tab, select the `Managed Application Operator` role. Click **Next**.
204212
11. On the **Members** tab, select a **User, group, or service principal**.
205213
12. Click **Select members**.
206214
13. Search for and select the Service Principal you created above.
@@ -209,12 +217,38 @@ az ad sp create-for-rbac --name <ServicePrincipalName>
209217
16. Return to the Azure home page. Select your storage account.
210218
17. On the overview page for your storage account, select **Access control (IAM)**.
211219
18. Click **Add** and select **Add role assignment**.
212-
4. On the **Roles** tab, select the `Storage Blob Data Contributor` role. Click **Next**.
213-
11. On the **Members** tab, select a **User, group, or service principal**.
214-
12. Click **Select members**.
215-
13. Search for and select the Service Principal you created above.
216-
14. Click **Select**.
217-
15. Under the **Members** header, verify that you selected your Service Principal. Click **Review + assign**.
220+
19. On the **Roles** tab, select the `Storage Blob Data Contributor` role. Click **Next**.
221+
20. On the **Members** tab, select a **User, group, or service principal**.
222+
21. Click **Select members**.
223+
22. Search for and select the Service Principal you created above.
224+
23. Click **Select**.
225+
24. Under the **Members** header, verify that you selected your Service Principal. Click **Review + assign**.
226+
25. Open your Key Vault. In the sidebar, select **Secrets**.
227+
26. Click **Generate/Import**.
228+
27. On the Create a secret page, select **Manual**. Enter the name `spsecret` for your secret, and enter the name of the secret you created in Databricks in the **Value** field.
229+
28. From your Azure CLI, call the Databricks SCIM API to add your service principal to your workspace, replacing `<per-workspace-url> `with the URL of your Databricks workspace, `<personal-access-token> `with the access token you created in an earlier step, and `<application-id>` with the client ID of your service principal: <br/>
230+
```curl
231+
curl -X POST 'https://<per-workspace-url>/api/2.0/preview/scim/v2/ServicePrincipals' \
232+
--header 'Content-Type: application/scim+json' \
233+
--header 'Authorization: Bearer <personal-access-token>' \
234+
--data-raw '{
235+
"schemas":[
236+
"urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal"
237+
],
238+
"applicationId":"<application-id>",
239+
"displayName": "test-sp",
240+
"entitlements":[
241+
{
242+
"value":"allow-cluster-create"
243+
}
244+
]
245+
}'
246+
```
247+
29. Open Databricks and navigate to your cluster. Select **Permissions**.
248+
30. In the permissions menu, grant your service principal **Can Manage** permissions.
249+
250+
> warning " "
251+
> Before continuing, note the Client ID and Client Secret for your Service Principal: you'll need these variables when configuring the Azure Data Lakes destination in the Segment app.
218252
219253
### Step 6 - Configure Databricks Cluster
220254

@@ -258,7 +292,7 @@ spark.sql.hive.metastore.jars builtin
258292
5. Open the **Advanced options** toggle and paste the Spark config you copied above, replacing the variables (`<example_variable>`) with information from your workspace.
259293
6. Select **Confirm and restart**. On the popup window, select **Confirm**.
260294
7. Log in to your Azure MySQL database using the following command: <br/>
261-
```powershell
295+
```curl
262296
mysql --host=[HOSTNAME] --port=3306 --user=[USERNAME] --password=[PASSWORD]
263297
```
264298
8. Once you've logged in to your MySQL database, run the following commands: <br/>
@@ -303,12 +337,12 @@ After you set up the necessary resources in Azure, the next step is to set up th
303337
- **Databricks Workspace Resource Group**: The resource group that hosts your Azure Databricks instance. This is visible in Azure on the overview page for your Databricks instance.
304338
- **Region**: The location of the Azure Storage account you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
305339
- **Service Principal Client ID**: The Client ID of the Service Principal that you set up in [Step 5 - Set up a Service Principal](#step-5---set-up-a-service-principal).
306-
- **Service Principal Client Secret**: The Client ID of the Service Principal that you set up in [Step 5 - Set up a Service Principal](#step-5---set-up-a-service-principal).
340+
- **Service Principal Client Secret**: The Secret for the Service Principal that you set up in [Step 5 - Set up a Service Principal](#step-5---set-up-a-service-principal).
307341

308342

309343
### (Optional) Set up your Azure Data Lake using Terraform
310344

311-
Instead of manually configuring your Data Lake, you can create it using the script in the [`terraform-azure-data-lake`](https://github.com/segmentio/terraform-azure-data-lakes) GitHub repository.
345+
Instead of manually configuring your Data Lake, you can create it using the script in the [`terraform-azure-data-lake`](https://github.com/segmentio/terraform-azure-data-lakes){:target="_blank”} GitHub repository.
312346

313347
> note " "
314348
> This script requires Terraform versions 0.12+.

0 commit comments

Comments
 (0)