Skip to content

Commit 0417804

Browse files
committed
Making changes requested by Alistair [DOC-493]
1 parent ea6a440 commit 0417804

File tree

3 files changed

+7
-8
lines changed

3 files changed

+7
-8
lines changed

src/connections/storage/catalog/data-lakes/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,6 @@ Before you can configure your Azure resources, you must complete the following p
104104
6. Click **Next: Advanced**.
105105
7. On the **Advanced Settings** tab in the Security section, select the following options:
106106
- Require secure transfer for REST API operations
107-
- Enable blob public access
108107
- Enable storage account key access
109108
- Minimum TLS version: Version 1.2
110109
8. In the Data Lake Storage Gen2 section, select **Enable hierarchical namespace**. In the Blob storage selection, select the **Hot** option.

src/connections/storage/data-lakes/comparison.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,18 @@ Data Lakes and Warehouses are not identical, but are compatible with a configura
1212
## Data freshness
1313

1414
Data Lakes and Warehouses offer different sync frequencies:
15-
- Warehouses can sync up to once an hour, with the ability to set a custom sync schedule and [selectively sync](/docs/connections/warehouses/selective-sync/) collections and properties within a source to Warehouses.
15+
- Warehouses can sync up to once an hour, with the ability to set a custom sync schedule and [selectively sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync) collections and properties within a source to Warehouses.
1616
- Data Lakes offers 12 syncs in a 24 hour period, and doesn't offer custom sync schedules or selective sync.
1717

1818
## Duplicates
1919

2020
Segment's [99% guarantee of no duplicates](/docs/guides/duplicate-data/) for data within a 24 hour look-back window applies to data in Segment Data Lakes and Warehouses.
2121

22-
> note "Deduplication is not supported for the Azure Data Lakes public beta"
23-
> Deduplication is not currently supported for the Azure Data Lakes public beta. For more information about Azure Data Lakes, see the [Data Lakes overview documentation](/docs/connections/storage/data-lakes/index/#how-azure-data-lakes-works).
24-
2522
[Warehouses](/docs/guides/duplicate-data/#warehouse-deduplication) and [Data Lakes](/docs/guides/duplicate-data/#data-lake-deduplication) also have a secondary deduplication system to further reduce the volume of duplicates to ensure clean data in your Warehouses and Data Lakes.
2623

24+
> note "Secondary deduplication is not supported during the Azure Data Lakes public beta"
25+
> During the Azure Data Lakes public beta, Segment's guarantee of 99% no duplicates applies, but secondary deduplication is not supported.
26+
2727
## Object vs event data
2828

2929
Warehouses support both event and object data, while Data Lakes supports only event data.

src/connections/storage/data-lakes/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -158,10 +158,10 @@ If Data Lakes sees a bad data type, for example text in place of a number or an
158158

159159
### Data Lake deduplication
160160

161-
> info "Azure Data Lakes does not support deduplication"
162-
> Deduplication is not supported for Azure Data Lakes during the Public Beta period.
161+
In addition to Segment's [99% guarantee of no duplicates](/docs/guides/duplicate-data/) for data within a 24 hour look-back window, Data Lakes have another layer of deduplication to ensure clean data in your Data Lake. Segment removes duplicate events at the time your Data Lake ingests data. Data Lakes deduplicate any data synced within the last 7 days, based on the `messageId` field.
163162

164-
In addition to Segment's [99% guarantee of no duplicates](/docs/guides/duplicate-data/) for data within a 24 hour look-back window, Data Lakes have another layer of deduplication to ensure clean data in your Data Lake. Segment removes duplicate events at the time your Data Lake ingests data. Data Lakes deduplicate any data synced within the last 7 days, based on the `message_id` field.
163+
> note "Secondary deduplication is not supported during the Azure Data Lakes public beta"
164+
> During the Azure Data Lakes public beta, Segment's guarantee of 99% no duplicates within the 24-hour look-back window applies, but secondary deduplication is not supported.
165165
166166
### Using a Data Lake with a Data Warehouse
167167

0 commit comments

Comments
 (0)