You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/destinations/catalog/amazon-kinesis-firehose/index.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,8 +12,8 @@ This document was last updated on February 05, 2020. If you notice any gaps, out
12
12
13
13
1. Create at least one Kinesis Firehose delivery stream. You can follow these [instructions](http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html) to create a new delivery stream.
14
14
2. Create an IAM policy.
15
-
- Sign in to the [Identity and Access Management (IAM) console](https://console.aws.amazon.com/iam/).
16
-
- Follow [these instructions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html#access_policies_create-json-editor) to create an IAM policy on the JSON to allow Segment permission to write to your Kinesis Firehose Stream.
15
+
1. Sign in to the [Identity and Access Management (IAM) console](https://console.aws.amazon.com/iam/).
16
+
2. Follow [these instructions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html#access_policies_create-json-editor) to create an IAM policy on the JSON to allow Segment permission to write to your Kinesis Firehose Stream.
17
17
- Use the following template policy in the **Policy Document** field. Be sure to change the `{region}`, `{account-id}` and `{stream-name}` with the applicable values.
18
18
19
19
@@ -36,16 +36,16 @@ This document was last updated on February 05, 2020. If you notice any gaps, out
36
36
37
37
38
38
3. Create an IAM role.
39
-
- Follow [these instructions](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html#roles-creatingrole-user-console) to create an IAM role to allow Segment permission to write to your Kinesis Firehose Stream.
40
-
- When prompted to enter an Account ID, enter `595280932656`.
41
-
- Select the checkbox to enable **Require External ID**.
42
-
- Enter your Segment Source ID as the **External ID**. This can be found in Segment by navigating to **Connections > Sources** and choosing the source you want to connect to your Kinesis Firehose destination. Click the **Settings** tab and choose **API Keys**.
39
+
1. Follow [these instructions](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html#roles-creatingrole-user-console) to create an IAM role to allow Segment permission to write to your Kinesis Firehose Stream.
40
+
2. When prompted to enter an Account ID, enter `595280932656`.
41
+
3. Select the checkbox to enable **Require External ID**.
42
+
4. Enter your Segment Source ID as the **External ID**. This can be found in Segment by navigating to **Connections > Sources** and choosing the source you want to connect to your Kinesis Firehose destination. Click the **Settings** tab and choose **API Keys**.
43
43
- **Note:** If you have multiple sources using Kinesis, enter one of their source IDs here for now and then follow the procedure outlined in the [Multiple Sources](#best-practices) section at the bottom of this doc once you’ve completed this step and saved your IAM role.
44
-
- When adding permissions to your new role, find the policy you created in step 2 and attach it.
44
+
5. When adding permissions to your new role, find the policy you created in step 2 and attach it.
45
45
46
46
4. Create a new Kinesis Firehose Destination.
47
-
- In the Segment source that you want to connect to your Kinesis Firehose destination, click **Add Destination**.
48
-
- Search and select the **Amazon Kinesis Firehose** destination and enter details for [these settings options](#settings).
47
+
1. In the Segment source that you want to connect to your Kinesis Firehose destination, click **Add Destination**.
48
+
2. Search and select the **Amazon Kinesis Firehose** destination and enter details for [these settings options](#settings).
49
49
50
50
## Page
51
51
Take a look to understand what the [Page method](https://segment.com/docs/connections/spec/page/) does. An example call would look like:
> **New Amplitude destination available**: Segment's [Destination Actions](/docs/connections/destinations/actions/) allow you to explicitly set up your Amplitude mapping, and configure which events the mappings apply to. See [Amplitude Actions destination](/docs/connections/destinations/catalog/amplitude-actions/) for more information.
17
+
> **New Amplitude destination available**: Segment's [Destination Actions](/docs/connections/destinations/actions/) allow you to explicitly set up your Amplitude mapping, and configure which events the mappings apply to. See [Amplitude Actions destination](/docs/connections/destinations/catalog/actions-amplitude/) for more information.
Copy file name to clipboardExpand all lines: src/guides/duplicate-data.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,10 +2,10 @@
2
2
title: How does Segment handle duplicate data?
3
3
---
4
4
5
-
Segment has a special de-duplication service that sits just behind the `api.segment.com` endpoint, and attempts to drop duplicate data. However, that de-duplication api has to hold the entire set of events in memory in order to know whether or not it has seen that event already. Segment stores 24 hours worth of event `message_id`s. This means Segment can de-duplicate any data that appears within a 24 hour rolling window.
5
+
Segment has a special de-duplication service that sits just behind the `api.segment.com` endpoint, and attempts to drop duplicate data. However, that de-duplication service has to hold the entire set of events in memory in order to know whether or not it has seen that event already. Segment stores 24 hours worth of event `message_id`s. This means Segment can de-duplicate any data that appears within a 24 hour rolling window.
6
6
7
-
An important point remember is that Segment de-duplicates on the event's `message_id`, _not_ on the contents of an event payload. So if you aren't generating `message_id`s for each event, or are trying to deduplicate data over a longer period than 24 hours, Segment does not have a built-in way to de-duplicate data.
7
+
An important point to remember is that Segment de-duplicates on the event's `message_id`, _not_ on the contents of an event payload. So if you aren't generating `message_id`s for each event, or are trying to de-duplicate data over a longer period than 24 hours, Segment does not have a built-in way to de-duplicate data.
8
8
9
-
Since the api layer is de-duping during this window, duplicate events that are further than 24 hours apart from one another must be de-duped in the Warehouse. Segment also dedupes messages going into a Warehouse based on the `message_id`, which is the `id` column in a Segment Warehouse. Note that in these cases you will see duplications in end tools as there is no additional layer prior to sending the event to downstream tools.
9
+
Since the API layer is de-duplicating during this window, duplicate events that are further than 24 hours apart from one another must be de-duplicated in the Warehouse. Segment also de-duplicates messages going into a Warehouse based on the `message_id`, which is the `id` column in a Segment Warehouse. Note that in these cases you will see duplications in end tools as there is no additional layer prior to sending the event to downstream tools.
10
10
11
11
Keep in mind that Segment's libraries all generate `message_id`s for you for each event payload, with the exception of the Segment HTTP API, which assigns each event a unique `message_id` when the message is ingested. You can override these default generated IDs and manually assign a `message_id` if necessary.
0 commit comments