segmentio
diff --git a/‎src/connections/storage/catalog/data-lakes/index.md‎
Lines changed: 366 additions & 45 deletions b/‎src/connections/storage/catalog/data-lakes/index.md‎
Lines changed: 366 additions & 45 deletions
diff --git a/‎src/connections/storage/data-lakes/comparison.md‎
Lines changed: 6 additions & 3 deletions b/‎src/connections/storage/data-lakes/comparison.md‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎src/connections/storage/data-lakes/images/Azure_DL_setup.png‎
191 KB b/‎src/connections/storage/data-lakes/images/Azure_DL_setup.png‎
191 KB
diff --git a/‎src/connections/storage/data-lakes/images/data_lakes_overview_graphic.png‎
102 KB b/‎src/connections/storage/data-lakes/images/data_lakes_overview_graphic.png‎
102 KB
diff --git a/‎src/connections/storage/data-lakes/index.md‎
Lines changed: 104 additions & 42 deletions b/‎src/connections/storage/data-lakes/index.md‎
Lines changed: 104 additions & 42 deletions
diff --git a/‎src/connections/storage/data-lakes/lake-formation.md‎
Lines changed: 3 additions & 3 deletions b/‎src/connections/storage/data-lakes/lake-formation.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎src/connections/storage/data-lakes/sync-history.md‎
Lines changed: 10 additions & 13 deletions b/‎src/connections/storage/data-lakes/sync-history.md‎
Lines changed: 10 additions & 13 deletions
diff --git a/‎src/connections/storage/data-lakes/sync-reports.md‎
Lines changed: 7 additions & 7 deletions b/‎src/connections/storage/data-lakes/sync-reports.md‎
Lines changed: 7 additions & 7 deletions
@@ -12,15 +12,18 @@ Data Lakes and Warehouses are not identical, but are compatible with a configura
 ## Data freshness
 
 Data Lakes and Warehouses offer different sync frequencies:
-- Warehouses can sync up to once an hour, with the ability to set a custom sync schedule and [selectively sync](/docs/connections/warehouses/selective-sync/) collections and properties within a source to Warehouses.
+- Warehouses can sync up to once an hour, with the ability to set a custom sync schedule and [selectively sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync) collections and properties within a source to Warehouses.
 - Data Lakes offers 12 syncs in a 24 hour period, and doesn't offer custom sync schedules or selective sync.
 
 ## Duplicates
 
-Segment's [99% guarantee of no duplicates](/docs/guides/duplicate-data/) for data within a 24 hour look-back window applies to data in Data Lakes and Warehouses.
+Segment's [99% guarantee of no duplicates](/docs/guides/duplicate-data/) for data within a 24 hour look-back window applies to data in Segment Data Lakes and Warehouses.
 
 [Warehouses](/docs/guides/duplicate-data/#warehouse-deduplication) and [Data Lakes](/docs/guides/duplicate-data/#data-lake-deduplication) also have a secondary deduplication system to further reduce the volume of duplicates to ensure clean data in your Warehouses and Data Lakes.
 
+> note "Secondary deduplication is not supported during the Azure Data Lakes public beta"
+> During the Azure Data Lakes public beta, Segment's guarantee of 99% no duplicates applies, but secondary deduplication is not supported.
+
 ## Object vs event data
 
 Warehouses support both event and object data, while Data Lakes supports only event data.
@@ -103,6 +106,6 @@ Similar to tables, columns between Warehouses and Data Lakes will be the same, e
 
 - `event` and `event_text` - Each property within an event has its own column, however the naming convention for these columns differs between Warehouses and Data Lakes. Warehouses snake case the original payload value and preserves the original text within the `event_text` column. Data Lakes use the original payload value as-is for the column name, and does not need an `event_text` column.
 - `channel`, `metadata_*`, `project_id`, `type`, `version` - These columns are Segment internal data which are not found in Warehouses, but are found in Data Lakes. Warehouses is intentionally very detailed about it's transformation logic and does not include these. Data Lakes does include them due to its more straightforward approach to flatten the whole event.
-- (Redshift only) `uuid`, `uuid_ts` - Redshift customers will see columns for `uuid` and `uuid_ts`, which are used for de-duplication in Redshift; Other warehouses may have similar columns. These aren't relevant for Data Lakes so the columns won't appear there.
+- *(Redshift only)* `uuid`, `uuid_ts` - Redshift customers will see columns for `uuid` and `uuid_ts`, which are used for de-duplication in Redshift; Other warehouses may have similar columns. These aren't relevant for Data Lakes so the columns won't appear there.
 - `sent_at` - Warehouses computes the `sent_at` value based on timestamps found in the original event in order to account for clock skews and timestamps in the future. This was done when the Segment pipeline didn't do this on it's own, however it now calculates for this so Data Lakes does not need to do any additional computation, and will send the value as-is when computed at ingestion.
 - `integrations` - Warehouses does not include the integrations object.  Data Lakes flattens and includes the integrations object. You can read more about the `integrations` object [in the filtering data documentation](/docs/guides/filtering-data/#filtering-with-the-integrations-object).
@@ -4,10 +4,10 @@ title: Lake Formation
 
 {% include content/plan-grid.md name="data-lakes" %}
 
-Lake Formation is a fully managed service built on top of the AWS Glue Data Catalog that provides one central set of tools to build and manage a Data Lake. These tools help import, catalog, transform, and deduplicate data, as well as provide strategies to optimize data storage and security.
+Lake Formation is a fully managed service built on top of the AWS Glue Data Catalog that provides one central set of tools to build and manage a Data Lake. These tools help import, catalog, transform, and deduplicate data, as well as provide strategies to optimize data storage and security. To learn more about Lake Formation features, see [Amazon Web Services documentation](https://aws.amazon.com/lake-formation/features/){:target="_blank"}.
 
-> note "Learn more about Lake Formation features"
-> To learn more about Lake Formation features, refer to the [Amazon Web Services documentation](https://aws.amazon.com/lake-formation/features/){:target="_blank"}.
+> note "This feature is not supported in the Azure Data Lakes public beta"
+> Lake Formation is only supported for Segment Data Lakes. For more information about Azure Data Lakes, see the [Data Lakes overview documentation](/docs/connections/storage/data-lakes/index/#how-azure-data-lakes-works).
 
 The security policies in Lake Formation use two layers of permissions: each resource is protected by Lake Formation permissions (which control access to Data Catalog resources and S3 locations) and IAM permissions (which control access to Lake Formation and AWS Glue API resources). When any user or role reads or writes to a resource, that action must pass a both a Lake Formation and an IAM resource check: for example, a user trying to create a new table in the Data Catalog may have Lake Formation access to the Data Catalog, but if they don't have the correct Glue API permissions, they will be unable to create the table. 
 
 
@@ -5,6 +5,9 @@ title: Data Lakes Sync History and Health
 
 The Segment Data Lakes sync history and health tabs generate real-time information about data syncs so you can monitor the health and performance of your data lakes. These tools provide monitoring and debugging capabilities within the Data Lakes UI, so you can identify and proactively address data sync or data pipeline failures. 
 
+> note "This feature is not supported for the Azure Data Lakes public beta"
+> The Sync History/Sync Health tabs are currently not supported for the Azure Data Lakes public beta. For more information about Azure Data Lakes, see the [Data Lakes overview documentation](/docs/connections/storage/data-lakes/index/#how-azure-data-lakes-works).
+
 ## Sync History
 The 'Sync History' table shows detailed information about the latest 100 syncs to the data lake. The table includes the following fields:
 * **Sync status:** The status of the sync: either 'Success,' indicating that all rows synced correctly, 'Partial Success,' indicating that some rows synced correctly, or 'Failed,' indicating that no rows synced correctly
@@ -32,24 +35,18 @@ Above the Daily Row Volume table is an overview of the total syncs for the curre
 To access the Sync history page from the Segment app, open the **My Destinations** page and select the data lake. On the data lakes settings page, select the **Health** tab.
 
 ## Data Lakes Reports FAQ
-{% faq %}
-{% faqitem How long is a data point available? %}
+
+### How long is a data point available?
 The health tab shows an aggregate view of the last 30 days worth of data, while the sync history retains the last 100 syncs.
-{% endfaqitem %}
 
-{% faqitem How do sync history and health compare? %}
+### How do sync history and health compare?
 The sync history feature shows detailed information about the most recent 100 syncs to a data lake, while the health tab shows just the number of rows synced to the data lake over the last 30 days.
-{% endfaqitem %}
 
-{% faqitem What timezone is the time and date information in? %}
+### What timezone is the time and date information in?
 All dates and times on the sync history and health pages are in the user's local time. 
-{% endfaqitem %}
 
-{% faqitem When does the data update? %}
+### When does the data update?
 The sync data for both reports updates in real time.
-{% endfaqitem %}
 
-{% faqitem When do syncs occur? %}
-Syncs occur approximately every two hours. Users cannot choose how frequently the data lake syncs. 
-{% endfaqitem %}
-{% endfaq %}
+### When do syncs occur?
+Syncs occur approximately every two hours. Users cannot choose how frequently the data lake syncs. 
@@ -6,6 +6,9 @@ title: Data Lakes Sync Reports and Errors
 
 Segment Data Lakes generates reports with operational metrics about each sync to your data lake so you can monitor sync performance. These sync reports are stored in your S3 bucket and Glue Data Catalog. This means you have access to the raw data, so you can query it to answer questions and set up alerting and monitoring tools.
 
+> note "This feature is not supported for the Azure Data Lakes public beta"
+> The Sync Report tab is currently not supported for the Azure Data Lakes public beta. For more information about Azure Data Lakes, see the [Data Lakes overview documentation](/docs/connections/storage/data-lakes/index/#how-azure-data-lakes-works).
+
 ## Sync Report schema
 
 Your sync_report table stores all of your sync data. You can query it to answer common questions about data synced to your data lake.
@@ -261,13 +264,10 @@ Internal errors occur in Segment's internal systems, and should resolve on their
 
 ## FAQ
 
-{% faq %}
-{% faqitem How are Data Lakes sync reports different from the sync data for Segment Warehouses? %}
+### How are Data Lakes sync reports different from the sync data for Segment Warehouses?
 Both Warehouses and Data Lakes provide similar information about syncs, including the start and finish time, rows synced, and errors.
 
 However, Warehouse sync information is only available in the Segment app: on the Sync History page and Warehouse Health pages. With Data Lakes sync reports, the raw sync information is sent directly to your data lake. This means you can query the raw data and answer your own questions about syncs, and use the data to power alerting and monitoring tools.
-{% endfaqitem %}
-{% faqitem What happens if a sync is partly successful? %}
-Sync reports are currently generated only when a sync completes, or when it fails. Partial failure reporting is not currently supported.
-{% endfaqitem %}
-{% endfaq %}
+
+### What happens if a sync is partly successful?
+Sync reports are currently generated only when a sync completes, or when it fails. Partial failure reporting is not currently supported.