Skip to content

Commit 0c81f1e

Browse files
committed
Updating names for Azure product, steps 4 & most of 5 [DOC-493]
1 parent 8fa1e03 commit 0c81f1e

File tree

2 files changed

+52
-26
lines changed

2 files changed

+52
-26
lines changed

src/connections/storage/catalog/data-lakes/index.md

Lines changed: 47 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,12 @@ The time needed to process a Replay can vary depending on the volume of data and
7979

8080
Segment creates a separate EMR cluster to run replays, then destroys it when the replay finishes. This ensures that regular Data Lakes syncs are not interrupted, and helps the replay finish faster.
8181

82-
## Set up [Azure Data Lakes]
82+
## Set up Azure Data Lakes
8383

84-
> info "[Azure Data Lakes] is currently in Public Beta"
85-
> [Azure Data Lakes] is available in Public Beta.
84+
> info " "
85+
> Azure Data Lakes is available in Public Beta.
8686
87-
To set up [Azure Data Lakes], create your [Azure resources](/docs/src/connections/storage/data-lakes/#set-up-[azure-data-lakes]) and then enable the Data Lakes destination in the Segment app.
87+
To set up Azure Data Lakes, create your [Azure resources](/docs/src/connections/storage/data-lakes/#set-up-[azure-data-lakes]) and then enable the Data Lakes destination in the Segment app.
8888

8989
### Prerequisites
9090

@@ -141,7 +141,7 @@ Before you can configure your Azure resources, you must first [create an Azure s
141141
6. Click **Review + create**.
142142
7. Review your chosen settings. When you are satisfied with your selections, click **Create**.
143143
8. After your resource is deployed, click **Go to resource**.
144-
9. From the resouce page, select the **Connection security** tab.
144+
9. From the resource page, select the **Connection security** tab.
145145
10. Under the Firewall rules section, select **Yes** to allow access to Azure services, and click the **Allow current client IP address (xx.xxx.xxx.xx)** button to allow access from your current IP address.
146146
11. Click **Save** to save the changes you made on the **Connection security** page, and select the **Server parameters** tab.
147147
12. Update the `lower_case_table_names` value to 2, and click **Save**.
@@ -163,16 +163,42 @@ Before you can configure your Azure resources, you must first [create an Azure s
163163
### Step 4 - Set up Databricks
164164

165165
> note "Databricks pricing tier"
166-
> If you create a Databricks instance only for [Azure Data Lakes] to use, only the standard pricing tier is required. However, if you use your Databricks instance for other applications, you may require premium pricing.
166+
> If you create a Databricks instance only for Azure Data Lakes to use, only the standard pricing tier is required. However, if you use your Databricks instance for other applications, you may require premium pricing.
167+
168+
1. From the [home page of your Azure portal](https://portal.azure.com/#home){:target="_blank”}, select **Create a resource**.
169+
2. Search for and select **Azure Databricks**.
170+
3. On the Azure Database for MySQL resource page, select the **Azure Databricks** plan and click **Create**.
171+
4. On the **Basic** tab, select an existing subscription and resource group, enter a name for your workspace, select the region you'd like to house your Databricks instance in, and select a pricing tier. For those using the Databricks instance only for Azure Data Lakes, a Standard pricing tier is appropriate. If you plan to use your Databricks instance for more than just Azure Data Lakes, you may require the premium pricing tier.
172+
5. Click **Review + create**.
173+
6. Review your chosen settings. When you are satisfied with your selections, click **Create**.
174+
7. After your resource is deployed, click **Go to resource**.
175+
8. On the Azure Databricks Service overview page, click **Launch Workspace**.
176+
9. On the Databricks page, select **Create a cluster**.
177+
10. On the Compute page, select **Create Cluster**.
178+
11. Enter a name for your cluster and select the `Standard_DS4_v2` worker type. Set the minimum number of workers to 2, and the maximum number of workers to 8. __Segment recommends deselecting the "Terminate after X minutes" setting, as the time it takes to restart a cluster may delay your data lake syncs.__
179+
12. Click **Create Cluster**.
180+
13. Open [your Azure portal](https://portal.azure.com/#home){:target="_blank”} and select the Key Vault you created in a previous step.
181+
14. On the Key Vault page, select the JSON View link to view the Resource ID and vaultURI. Take note of these values, as you'll need them in the next step to configure your Databricks instance.
182+
15. Open `https://<databricks-instance>#secrets/createScope` and enter the following information to connect your Databricks instance to the Key Vault you created in an earlier step:
183+
- **Scope Name**: Set this value to `segment`.
184+
- **Manage Principal**: Select **All Users**.
185+
- **DNS Name**: Set this value to the Vault URI of your Key Vault instance.
186+
- **Resource ID**: The Resource ID of your Azure Key Vault instance.
187+
16. When you've entered all of your information, click **Create**.
167188

168189
> warning " "
169190
> Before continuing, note the Cluster ID, Workspace name, Workspace URL, and the Azure Resource Group for Databricks Workspace: you'll need these variables when configuring the Azure Data Lakes destination in the Segment app.
170191
171192
### Step 5 - Set up a Service Principal
172193

173-
### Step 6 - Configure Databricks cluster
194+
1. From the [home page of your Azure portal](https://portal.azure.com/#home){:target="_blank”}, select the Databricks instance you created in [Step 4 - Set up Databricks](#step-4---set-up-databricks).
195+
2. On the overview page for your Databricks instance, select **Access control (IAM)**.
196+
3. Click **Add** and select **Add role assignment**.
197+
4. On the **Members** tab, assign access to a **User, group, or service principal**.
198+
5. Click **Select members**.
199+
6. Search for and select the `Databricks Resource Provider` service principal.
174200

175-
### Step 7 - Enable the Data Lakes destination in the Segment app
201+
### Step 6 - Enable the Data Lakes destination in the Segment app
176202

177203
After you set up the necessary resources in Azure, the next step is to set up the Data Lakes destination in Segment:
178204

@@ -182,22 +208,22 @@ After you set up the necessary resources in Azure, the next step is to set up th
182208
2. Search for and select **Azure Data Lakes**.
183209
2. Click the **Configure Data Lakes** button, and select the source you'd like to receive data from. Click **Next**.
184210
3. In the **Connection Settings** section, enter the following values:
185-
- Azure Storage Account (The name of the Azure Storage account that you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account))
186-
- Azure Storage Container (The name of the Azure Storage Container you created in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account))
187-
- Azure Subscription ID
188-
- Azure Tenant ID
189-
- Databricks Cluster ID
190-
- Databricks Instance URL
191-
- Databricks Workspace Name
192-
- Databricks Workspace Resource Group
193-
- Region (The location of the Azure Storage account you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account)
194-
- Service Principal Client ID
195-
- Service Principal Client Secret
211+
- **Azure Storage Account**: The name of the Azure Storage account that you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
212+
- **Azure Storage Container**: The name of the Azure Storage Container you created in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
213+
- **Azure Subscription ID**: The ID of your [Azure subscription](https://docs.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id){:target="_blank”}.
214+
- **Azure Tenant ID**: The Tenant ID of your [Azure Active directory](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-how-to-find-tenant){:target="_blank”}.
215+
- **Databricks Cluster ID**: The ID of your [Databricks cluster](https://docs.databricks.com/workspace/workspace-details.html#cluster-url-and-id){:target="_blank”}
216+
- **Databricks Instance URL**: The ID of your [Databricks workspace](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids){:target="_blank”}
217+
- **Databricks Workspace Name**: The name of your [Databricks workspace](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids){:target="_blank”}
218+
- **Databricks Workspace Resource Group**: The resource group that hosts your Azure Databricks instance. This is visible in Azure on the overview page for your Databricks instance.
219+
- **Region**: The location of the Azure Storage account you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account)
220+
- **Service Principal Client ID**:
221+
- **Service Principal Client Secret**:
196222

197223

198224
### Optional - Set up the Data Lake using Terraform
199225

200-
Instead of manually configuring your Data Lake, you can create a Data Lake using the script in the [`terraform-azure-data-lake`](https://github.com/segmentio/terraform-azure-data-lakes) Github repository.
226+
Instead of manually configuring your Data Lake, you can create a Data Lake using the script in the [`terraform-azure-data-lake`](https://github.com/segmentio/terraform-azure-data-lakes) GitHub repository.
201227

202228
> note " "
203229
> This script requires Terraform versions 0.12+.
@@ -328,4 +354,4 @@ Replace:
328354
{% endfaqitem %}
329355
{% endfaq %}
330356

331-
### [Azure Data Lakes]
357+
### Azure Data Lakes

src/connections/storage/data-lakes/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Segment sends data to S3 by orchestrating the processing in an EMR (Elastic MapR
3636

3737
![A diagram visualizing data flowing from a Segment user into your account and into a Glue catalog/S3 bucket](images/dl_vpc.png)
3838

39-
### How [Azure Data Lakes] works
39+
### How Azure Data Lakes works
4040

4141
Data Lakes store Segment data in ADLS in a read-optimized encoding format (Parquet) which makes the data more accessible and actionable. To help you zero-in on the right data, Data Lakes also creates logical data partitions and event tables, and integrates metadata with existing schema management tools, like the Hive Metastore. The resulting data set is optimized for use with systems like Power BI and Azure HDInsight or machine learning vendors like Azure DataBricks or Azure Synapse Analytics.
4242

@@ -60,7 +60,7 @@ Data Lakes uses an IAM role to grant Segment secure access to your AWS account.
6060
- **external_ids**: External IDs are the part of the IAM role which Segment uses to assume the role providing access to your AWS account. You will define the external ID in the IAM role as the Segment Workspace ID in which you want to connect to Data Lakes. The Segment Workspace ID can be retrieved from the [Segment app](https://app.segment.com/goto-my-workspace/overview){:target="_blank"} by navigating to Settings > General Settings > ID.
6161
- **s3_bucket**: Name of the S3 bucket used by the Data Lake.
6262

63-
### Set up [Azure Data Lakes]
63+
### Set up Azure Data Lakes
6464

6565
Before you can connect your [Azure Data Lake] to Segment, you must set up the following components in your Azure environment:
6666

@@ -71,7 +71,7 @@ Before you can connect your [Azure Data Lake] to Segment, you must set up the fo
7171
- [Azure MySQL Database](https://docs.microsoft.com/en-us/azure/purview/register-scan-azure-mysql-database){:target="_blank”}: The MySQL database is a relational database service based on the MySQL Community Edition, versions 5.6, 5.7, and 8.0.
7272
- [Azure KeyVault Instance](https://docs.microsoft.com/en-us/azure/key-vault/general/quick-create-portal){:target="_blank”}: Azure KeyVault provides a secure store for your keys, secrets, and certificates.
7373

74-
For more information about configuring [Azure Data Lakes], see the [Data Lakes setup page](/docs/connections/storage/catalog/data-lakes/).
74+
For more information about configuring Azure Data Lakes, see the [Data Lakes setup page](/docs/connections/storage/catalog/data-lakes/).
7575

7676
## Data Lakes schema
7777

@@ -124,7 +124,7 @@ The schema inferred by Segment is stored in a Glue database within Glue Data Cat
124124
> info ""
125125
> The recommended IAM role permissions grant Segment access to create the Glue databases on your behalf. If you do not grant Segment these permissions, you must manually create the Glue databases for Segment to write to.
126126
127-
### [Azure Data Lakes] schema
127+
### Azure Data Lakes schema
128128

129129
### Data types
130130

@@ -137,7 +137,7 @@ The data types supported in [AWS Data Lakes] are:
137137
- string
138138
- timestamp
139139

140-
The data types supported in the [Azure Data Lakes] are:
140+
The data types supported in the Azure Data Lakes are:
141141
- bigint
142142
- boolean
143143
- decimal(38,6)

0 commit comments

Comments
 (0)