Skip to content

Commit 8016a5d

Browse files
author
markzegarelli
authored
Revert "Segment Datalakes Azure Document Changes. (#3495)" (#3502)
This reverts commit 9cecebb.
1 parent f89d8b3 commit 8016a5d

File tree

4 files changed

+10
-15
lines changed

4 files changed

+10
-15
lines changed

.github/styles/Vocab/Docs/accept.txt

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
(?:U|u)rls?\b
1818
adset
1919
Adwords
20-
Aircall
2120
allowlist
2221
Amberflo
2322
Appboy
@@ -37,7 +36,6 @@ Cocoapods
3736
Contentful
3837
Criteo
3938
csv
40-
Databricks
4139
datetime
4240
deeplink
4341
Dev
@@ -73,7 +71,6 @@ Jimo
7371
Jivox
7472
Kameleoon
7573
Kissmetrics
76-
Leanplum
7774
Lightbox
7875
Littledata
7976
Mailchimp
@@ -120,4 +117,6 @@ waitlist
120117
WebKit
121118
Wootric
122119
Xcode
123-
Zendesk
120+
Zendesk
121+
Leanplum
122+
Aircall
Binary file not shown.
Binary file not shown.

src/connections/storage/catalog/data-lakes/index.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,6 @@ redirect_from: '/connections/destinations/catalog/data-lakes/'
66

77

88
Segment Data Lakes provide a way to collect large quantities of data in a format that's optimized for targeted data science and data analytics workflows. You can read [more information about Data Lakes](/docs/connections/storage/data-lakes/) and learn [how they differ from Warehouses](/docs/connections/storage/data-lakes/comparison/) in Segment's Data Lakes documentation.
9-
Segment supports two type of data-lakes:
10-
- [AWS Data Lakes](/docs/connections/storage/catalog/data-lakes/#set-up-segment-data-lakes)
11-
- [Azure Data Lakes](/docs/connections/storage/catalog/data-lakes/#set-up-azure-data-lakes)
129

1310
> note "Lake Formation"
1411
> You can also set up your Segment Data Lakes using [Lake Formation](/docs/connections/storage/data-lakes/lake-formation/), a fully managed service built on top of the AWS Glue Data Catalog.
@@ -258,7 +255,8 @@ curl -X POST 'https://<per-workspace-url>/api/2.0/preview/scim/v2/ServicePrincip
258255
> warning "Optional configuration settings for log4j vulnerability"
259256
> While Databricks released a statement that clusters are likely unaffected by the log4j vulnerability, out of an abundance of caution, Databricks recommends updating to log4j 2.15+ or adding the following options to the Spark configuration: <br/> `spark.driver.extraJavaOptions "-Dlog4j2.formatMsgNoLookups=true"`<br/>`spark.executor.extraJavaOptions "-Dlog4j2.formatMsgNoLookups=true"`
260257
261-
1. Connect to a [Hive metastore](https://docs.databricks.com/data/metastores/external-hive-metastore.html){:target="_blank”} on your Databricks cluster using the following Spark configuration, replacing the variables (`<example_variable>`) with information from your workspace: <br/>
258+
1. Connect to a [Hive metastore](https://docs.databricks.com/data/metastores/external-hive-metastore.html){:target="_blank”} on your Databricks cluster.
259+
2. Copy the following Spark configuration, replacing the variables (`<example_variable>`) with information from your workspace: <br/>
262260
```py
263261
## Configs so we can read from the storage account
264262
spark.hadoop.fs.azure.account.oauth.provider.type.<storage_account_name>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
@@ -269,7 +267,7 @@ spark.hadoop.fs.azure.account.oauth2.client.id.<storage_account_name>.dfs.core.w
269267
##
270268
##
271269
spark.hadoop.javax.jdo.option.ConnectionDriverName org.mariadb.jdbc.Driver
272-
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:mysql://<db-host>:<port>/<database-name>?useSSL=true&requireSSL=true&enabledSslProtocolSuites=TLSv1.2
270+
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:mysql://<db-host>:<port>/<database-name>?useSSL=true&requireSSL=false
273271
spark.hadoop.javax.jdo.option.ConnectionUserName <database_user>
274272
spark.hadoop.javax.jdo.option.ConnectionPassword <database_password>
275273
##
@@ -330,13 +328,11 @@ After you set up the necessary resources in Azure, the next step is to set up th
330328
2. Click the **Configure Data Lakes** button, and select the source you'd like to receive data from. Click **Next**.
331329
3. In the **Connection Settings** section, enter the following values:
332330
- **Azure Storage Account**: The name of the Azure Storage account that you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
333-
![img.png](images/storageaccount.png)
334331
- **Azure Storage Container**: The name of the Azure Storage Container you created in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
335-
![img_1.png](images/storagecontainer.png)
336-
- **Azure Subscription ID**: The ID of your [Azure subscription](https://docs.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id){:target="_blank”}. <br> Please add it as it is in the Azure portal, in the format `********-****-****-****-************`
337-
- **Azure Tenant ID**: The Tenant ID of your [Azure Active directory](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-how-to-find-tenant){:target="_blank”}. <br> Please add it as it is in the Azure portal, in the format `********-****-****-****-************`
332+
- **Azure Subscription ID**: The ID of your [Azure subscription](https://docs.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id){:target="_blank”}.
333+
- **Azure Tenant ID**: The Tenant ID of your [Azure Active directory](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-how-to-find-tenant){:target="_blank”}.
338334
- **Databricks Cluster ID**: The ID of your [Databricks cluster](https://docs.databricks.com/workspace/workspace-details.html#cluster-url-and-id){:target="_blank”}.
339-
- **Databricks Instance URL**: The ID of your [Databricks workspace](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids){:target="_blank”}. <br> The correct format for adding the URL is 'adb-0000000000000000.00.azureatabricks.net'
335+
- **Databricks Instance URL**: The ID of your [Databricks workspace](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids){:target="_blank”}.
340336
- **Databricks Workspace Name**: The name of your [Databricks workspace](https://docs.databricks.com/workspace/workspace-details.html#workspace-instance-names-urls-and-ids){:target="_blank”}.
341337
- **Databricks Workspace Resource Group**: The resource group that hosts your Azure Databricks instance. This is visible in Azure on the overview page for your Databricks instance.
342338
- **Region**: The location of the Azure Storage account you set up in [Step 1 - Create an ALDS-enabled storage account](#step-1---create-an-alds-enabled-storage-account).
@@ -485,4 +481,4 @@ spark.sql.hive.metastore.schema.verification.record.version false
485481
<br/>After you've added to your config, restart your cluster so that your changes can take effect. If you continue to encounter errors, [contact Segment Support](https://segment.com/help/contact/){:target="_blank"}.
486482

487483
#### What do I do if I get a "Version table does not exist" error when setting up the Azure MySQL database?
488-
Check your Spark configs to ensure that the information you entered about the database is correct, then restart the cluster. The Databricks cluster automatically initializes the Hive Metastore, so an issue with your config file will stop the table from being created. If you continue to encounter errors, [contact Segment Support](https://segment.com/help/contact/){:target="_blank"}.
484+
Check your Spark configs to ensure that the information you entered about the database is correct, then restart the cluster. The Databricks cluster automatically initializes the Hive Metastore, so an issue with your config file will stop the table from being created. If you continue to encounter errors, [contact Segment Support](https://segment.com/help/contact/){:target="_blank"}.

0 commit comments

Comments
 (0)