Skip to content

Commit cb7a0f4

Browse files
committed
clarifying that the 24 hour look-back window is approximate
1 parent c28836d commit cb7a0f4

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/guides/duplicate-data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22
title: Handling Duplicate Data
33
---
44

5-
Segment guarantees that 99% of your data won't have duplicates within a 24 hour or longer look-back window. Warehouses and Data Lakes also have their own secondary deduplication process to ensure you store clean data.
5+
Segment guarantees that 99% of your data won't have duplicates within an approximately 24 hour look-back window. Warehouses and Data Lakes also have their own secondary deduplication process to ensure you store clean data.
66

77
## 99% deduplication
88

99
Segment has a special deduplication service that sits behind the `api.segment.com` endpoint and attempts to drop 99% of duplicate data. Segment stores at least 24 hours worth of event `messageId`s, allowing Segment to deduplicate any data that appears within a 24 hour rolling window.
1010

11-
Segment deduplicates on the event's `messageId`, _not_ on the contents of the event payload. Segment doesn't have a built-in way to deduplicate data over periods longer than 24 hours or for events that don't generate `messageId`s.
11+
Segment deduplicates on the event's `messageId`, _not_ on the contents of the event payload. Segment doesn't have a built-in way to deduplicate data over periods longer than approximately 24 hours or for events that don't generate `messageId`s.
1212

1313
> info ""
1414
> Keep in mind that Segment's libraries all generate `messageId`s for each event payload, with the exception of the Segment HTTP API, which assigns each event a unique `messageId` when the message is ingested. You can override these default generated IDs and manually assign a `messageId` if necessary.

0 commit comments

Comments
 (0)