Skip to content

Commit c4d5530

Browse files
authored
Merge branch 'master' into niall/update_idfa_react
2 parents 4a65530 + 09c844f commit c4d5530

27 files changed

+444
-103
lines changed

src/_data/sidenav/main.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,6 @@ sections:
3535
title: How does Segment handle duplicate data?
3636
- path: /guides/ignore-bots
3737
title: How can I ignore internet bots?
38-
- path: /guides/intelligent-tracking-prevention
39-
title: Is Segment impacted by Intelligent Tracking Prevention?
4038
- path: /guides/segment-vs-tag-managers
4139
title: What is the difference between Segment and tag managers?
4240
- path: /guides/what-is-replay

src/_data/sidenav/strat.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ sections:
6262
section:
6363
- path: /connections/sources/catalog/libraries/website/javascript
6464
title: Analytics.js (Javascript) Source
65+
- path: /connections/sources/catalog/libraries/website/javascript/supported-browsers
66+
title: Analytics.js Supported Browsers
6567
- path: /connections/sources/catalog/libraries/website/javascript/quickstart
6668
title: Quickstart Tutorial
6769
- path: /connections/sources/catalog/libraries/website/javascript/identity

src/connections/sources/catalog/libraries/website/javascript/index.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -621,11 +621,6 @@ When enabled, Analytics.js automatically retries network and server errors. With
621621

622622
Analytics.js stores events in `localStorage` and falls back to in-memory storage when `localStorage` is unavailable. It retries up to 10 times with an incrementally increasing backoff between each retry. Analytics.js queues up to 100 events at a time to avoid using too much of the device's local storage. See the [destination Retries documentation](/docs/connections/destinations/#retries) to learn more.
623623

624-
625-
## Proxying Analytics.js
626-
627-
To use a proxy server with Analytics.js, first change the `cdn.segment.com` address in the snippet to use your own host. Next, [contact Segment Product Support](https://segment.com/help/contact/) and request to change the endpoint Segment sends your events to so that is uses your proxy instead. Make sure that your proxy behaves exactly like the Segment APIs. You can use [the Segment proxy server](https://github.com/segmentio/segment-proxy) as an example of a correctly-working proxy.
628-
629624
## Plugins
630625

631626
Segment offers video player 'plugins' so you can quickly collect video events using Analytics.js. See the specific documentation below to learn more:
@@ -657,3 +652,14 @@ Segment's Analytics.js javascript snippet only increases the page size by about
657652
However, the snippet asynchronously requests and loads a customized javascript bundle (`analytics.min.js`), which contains the code and settings needed to load your [device-mode destinations](/docs/connections/destinations/#connection-modes). The size of this file changes depending on how many and which destinations you enable.
658653

659654
Without any destinations enabled, the `analytics.min.js` file is about 62KB. Each time you enable a destination, the file's size may increase slightly.
655+
656+
### Localstorage cookies used by Analytics.js
657+
658+
Analytics.js uses a few `localstorage` cookies if you have retries enabled, to keep track of retry timing.
659+
- The `ack` cookie is a timer used to see if another tab should claim the retry queue.
660+
- The `reclaimStart` and `reclaimEnd` cookies determine if a tab takes over the queue from another tab.
661+
- The `inProgress` and `queue` cookies track events in progress, and events that are queued to be retried.
662+
663+
For more information, visit the [Segment localstorage-retry library](https://github.com/segmentio/localstorage-retry).
664+
665+
You can set the `debug` cookie to `analytics.js` to log debug messages from Analytics.js to the console.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: Analytics.js Browser Support
3+
redirect_from: '/guides/intelligent-tracking-prevention'
4+
strat: ajs
5+
---
6+
7+
[The Segment JavaScript library, Analytics.js](https://segment.com/docs/connections/sources/catalog/libraries/website/javascript/), loads a snippet on your webpage that supports existing user consent APIs and native browser controls. Segment regularly tests Analytics.js against the following browsers on all major platforms, and updates the library accordingly.
8+
9+
The library is regularly tested and is functional with the following browsers:
10+
11+
- Internet Explorer
12+
- Apple Safari
13+
- Google Chrome
14+
- Mozilla Firefox
15+
- Microsoft Edge
16+
- Brave
17+
18+
### Internet Explorer Support
19+
20+
Segment guarantees support for Internet Explorer 11 and later for Analytics.js. Remember that different bundled (device-mode) destinations might have different compatibility guarantees for their own products. Refer to the vendor’s documentation to confirm browser compatibility.
21+
22+
23+
## Tracking Protection (ITP, ETP)
24+
25+
Segment is a customer data platform (CDP) that helps companies harness first-party customer data. The recent browser changes fully align with Segment’s privacy stance.
26+
27+
Browser manufacturers have enhanced their privacy features by adding third-party tracking protection mechanisms for end-users. These browser changes target third-party trackers and their cookies, and each platform takes a different approach.
28+
29+
For example, [Firefox Enhanced Tracking Protection (ETP)](https://blog.mozilla.org/blog/2020/08/04/latest-firefox-rolls-out-enhanced-tracking-protection-2-0-blocking-redirect-trackers-by-default/) relies on a dynamic list of known trackers to decide what to block. Browsers that use [Apple’s Webkit engine](https://webkit.org/blog/10218/full-third-party-cookie-blocking-and-more/), like Safari and Chrome for iOS, use Intelligent Tracking Protection (ITP) which actively prevents the browser from loading cookies from a third-party domain.
30+
31+
> info ""
32+
> **Note:** Segment cookies expire after seven days of user inactivity, like all other application cookies under the Webkit engine ITP policy.
33+
34+
## Proxying Analytics.js
35+
36+
Because of regulatory, environmental, or security concerns, some customers prefer to set up proxy infrastructure for Analytics.js. You can use the `apihost` configuration option in the Analytics object to route traffic to different API endpoints. Business Tier customers can use Segment’s fully-managed proxy service: In-domain Instrumentation, which provides a similar function without the additional maintenance burden and cost of running your own proxy infrastructure.
37+
38+
> info ""
39+
> Business Tier customers who want to use a fully-managed proxy service can contact their account teams for full details.

src/connections/sources/catalog/libraries/website/javascript/troubleshooting.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,6 @@ In the below image, we use Google Analytics as an example. Our `page` call forms
5151

5252
If this outbound request is not showing up in the network when you fire an `identify` call, then check the following:
5353

54-
## Do you have any ad blockers enabled in your browser?
55-
56-
Segment and many destination partners use cookies/local storage to store information about users in the browser. Ad blockers prevent cookies and other data these tools rely on to make valid analytics requests. Some portion of your users are probably using ad blockers, which prevent the Segment script from fully executing. Both desktop and mobile browsers are impacted.
57-
58-
One particular issue is Safari private browsing mode which allows Analytics.js Identify calls to be made, but the traits object is stripped from the call. This results in identify calls missing email address and other traits.
59-
60-
## Internet Explorer Support
61-
62-
We guarantee support for Internet Explorer 9 and higher for Analytics.js. Keep in mind that different tools may have different compatibility guarantees for their own products. Refer to the vendor's documents to see what their browser compatibility looks like.
6354

6455
## Is your web site deployed under a domain on the Public Suffix List?
6556

@@ -112,10 +103,19 @@ console.log(JSON.stringify({ x: undefined, y: 6 }));
112103
// expected output: "{"y":6}"
113104
```
114105

106+
## Why am I seeing a "SameSite" warning?
107+
108+
If you see a warning like the following, it could have one of several causes:
109+
"A cookie associated with a cross-site resource at http://segment.com/ was set without the `SameSite` attribute [...]"
110+
111+
Segment correctly sets cookies with the 'SameSite' attribute with Analytics.js.
112+
113+
If you see this warning, it is because you previously visited http://segment.com, and are getting the warning due to unrelated cookies. To verify that this is the issue, visit your page in Incognito Mode and confirm that the warning no longer occurs. Your users won't see this warning unless they _also_ visited http://segment.com.
114+
115115

116116
### Can I overwrite the context fields?
117117

118-
Yes. This can be useful if some of these fields contain information you don't want to collect.
118+
Yes. This can be useful if some of these fields contain information you don't want to collect.
119119

120120
For example, imagine that your website allows users to view a receipt for purchases at the URL `https://mywebsite.com/store/purchases`. Your users click a link that redirects to that specific URL, your app sets a `receiptId` in the query string, and returns the appropriate receipt. You also send a Track call to Segment from this page.
121121

@@ -125,7 +125,7 @@ Since this `receiptId` might contain sensitive information, you can prevent the
125125
analytics.track("Receipt Viewed", {}, {
126126
page: {
127127
url: null
128-
}
128+
}
129129
})
130130
```
131131

src/connections/storage/catalog/data-lakes/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Before you set up Segment Data Lakes, you need the following resources:
2020

2121
You can use the [open source Terraform module](https://github.com/segmentio/terraform-aws-data-lake) to automate much of the set up work to get Data Lakes up and running. If you’re familiar with Terraform, you can modify the module to meet your organization’s needs, however Segment guarantees support only for the template as provided. The Data Lakes set up uses Terraform v0.11+. To support more versions of Terraform, the aws provider must use v2, which is included in our example main.tf.
2222

23-
You can also use our [manual set up instructions](https://docs.google.com/document/d/1GlWzS5KO4QaiVZx9pwfpgF-N-Xy2e_QQcdYSX-nLMDU/view) to configure these AWS resources if you prefer.
23+
You can also use our [manual set up instructions](/docs/connections/storage/data-lakes/data-lakes-manual-setup) to configure these AWS resources if you prefer.
2424

2525
The Terraform module and manual set up instructions both provide a base level of permissions to Segment (for example, the correct IAM role to allow Segment to create Glue databases on your behalf). If you want stricter permissions, or other custom configurations, you can customize these manually.
2626

@@ -57,12 +57,12 @@ Once the Data Lakes destination is enabled, the first sync will begin approximat
5757

5858
## Step 3 - Verify Data is Synced to S3 and Glue
5959

60-
You will see event data and [sync reports](https://segment.com/docs/connections/storage/data-lakes/sync-reports) populated in S3 and Glue after the first sync successfully completes. However if an [insufficient permission](https://segment.com/docs/connections/storage/data-lakes/sync-reports/#insufficient-permissions) or [invalid setting](https://segment.com/docs/connections/storage/data-lakes/sync-reports/#invalid-settings) is provided during set up, the first data lake sync will fail.
60+
You will see event data and [sync reports](/docs/connections/storage/data-lakes/sync-reports) populated in S3 and Glue after the first sync successfully completes. However if an [insufficient permission](/docs/connections/storage/data-lakes/sync-reports/#insufficient-permissions) or [invalid setting](/docs/connections/storage/data-lakes/sync-reports/#invalid-settings) is provided during set up, the first data lake sync will fail.
6161

6262
To be alerted of sync failures via email, subscribe to the `Storage Destination Sync Failed` activity email notification within the App Settings > User Preferences > [Notification Settings](https://app.segment.com/goto-my-workspace/settings/notifications).
6363
![](images/dl_activity_notifications2.png)
6464

65-
`Sync Failed` emails are sent on the 1st, 5th and 20th sync failure. Learn more about the types of errors which can cause sync failures [here](https://segment.com/docs/connections/storage/data-lakes/sync-reports/#sync-errors).
65+
`Sync Failed` emails are sent on the 1st, 5th and 20th sync failure. Learn more about the types of errors which can cause sync failures [here](/docs/connections/storage/data-lakes/sync-reports/#sync-errors).
6666

6767

6868
## (Optional) Step 4 - Replay Historical Data
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
---
2+
hidden: true
3+
title: Configure the Data Lakes AWS Environment
4+
---
5+
6+
The instructions below will guide you through the process required to configure the environment required to begin loading data into your Segment Data Lake. For a more automated process, see [Step 1 - Configure AWS Resources](#step-1---configure-aws-resources) above.
7+
8+
9+
## Step 1 - Create an S3 Bucket
10+
11+
In this step, you'll create the S3 bucket that will store both the intermediate and final data.
12+
13+
> info ""
14+
> Take note of the S3 bucket name you set in this step, as the rest of the set up flow requires it. In these instructions, `segment-data-lake` is used.
15+
16+
During the set up process, create a Lifecycle rule and set it to expire staging data after **14 days**. For more information, see Amazon's documentation, [How do I create a lifecycle?](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html).
17+
18+
![Create a Lifecycle rule to expire staging data after 14 days](images/01_14-day-lifecycle.png)
19+
20+
## Step 2 - Configure an EMR cluster
21+
22+
Segment requires access to an EMR cluster to perform necessary data processing. We recommend starting with a small cluster, with the option to add more compute as required.
23+
24+
### Configure the hardware and networking configuration
25+
26+
1. Locate and select EMR from the AWS console.
27+
2. Click **Create Cluster**, and open the **Advanced Options**.
28+
3. In the Advanced Options, on Step 1: Software and Steps, ensure the following options are selected, along with the defaults:
29+
- `Use for Hive table metadata`
30+
- `Use for Spark table metadata` ![Select to use for both Have and Spark table metadata](images/02_hive-spark-table.png)
31+
4. In the Networking setup section, select to create the cluster in either a public or private subnet. Creating the cluster in a private subnet is more secure, but requires some additional configuration. Creating a cluster in a public subnet is accessible from the internet. However, you can configure strict security groups to prevent inbound access to the cluster. See Amazon's document, [Amazon VPC Options - Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-clusters-in-a-vpc.html) for more information. As a best practice, Segment recommends that you consult with your network and security before you configure your EMR cluster.
32+
5. In the Hardware Configuration section, create a cluster with the nodes listed below. This configuration uses the default **On demand** purchasing option for the instances.
33+
- **1** master node
34+
- **2** core nodes
35+
- **2** task nodes ![Configure the number of nodes](images/03_hardware-node-instances.png)
36+
37+
For more information about configuring the cluster hardware and networking, see Amazon's document, [Configure Cluster Hardware and Networking](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances.html).
38+
39+
### Enable EMR managed scaling for the Core and Task nodes
40+
41+
On the **Cluster Scaling** settings, select **Use EMR-managed scaling**, and select the following number of task units:
42+
- Minimum: **2**
43+
- Maximum: **8**
44+
- On-demand limit: **8**
45+
- Maximum Core Node: **2**
46+
47+
![Configure the Cluster scaling options](images/04_cluster-scaling.png)
48+
49+
### Configure logging
50+
51+
On the General Options step, configure logging to use the same S3 bucket you configured as the destination for the final data (`segment-data-lakes` in this case). Once configured, logs will be written to a new prefix, and separated from the final processed data.
52+
53+
Set value of the **vendor** tag to `segment`.
54+
55+
![Configure logging](images/05_logging.png)
56+
57+
### Secure the cluster
58+
59+
On the Security step, ensure that the following steps have been completed:
60+
1. Create or select an **EC2 key pair**.
61+
2. Choose the appropriate roles in the **EC2 instance profile**.
62+
3. Select the appropriate security groups for the Master and Core & Task types.
63+
64+
![Secure the cluster](images/06_secure-cluster.png)
65+
66+
## Step 3 - Create an Access Management role and policy
67+
68+
The following steps provide examples of the IAM Role and IAM Policy.
69+
70+
### IAM Role
71+
72+
Create a `segment-data-lake-role` role for Segment to assume. Attach the following trust relationship document to the role:
73+
74+
```json
75+
{
76+
"Version": "2012-10-17",
77+
"Statement": [
78+
{
79+
"Sid": "",
80+
"Effect": "Allow",
81+
"Principal": {
82+
"AWS": [
83+
"arn:aws:iam::294048959147:role/customer-datalakes-prod-admin",
84+
"arn:aws:iam::294048959147:role/datalakes-aws-worker",
85+
"arn:aws:iam::294048959147:role/datalakes-customer-service"
86+
]
87+
},
88+
"Action": "sts:AssumeRole",
89+
"Condition": {
90+
"StringEquals": {
91+
"sts:ExternalId": [
92+
"SOURCE_1",
93+
"SOURCE_N"
94+
]
95+
}
96+
}
97+
}
98+
]
99+
}
100+
```
101+
102+
> note ""
103+
> **NOTE:** Replace the `ExternalID` list with the Segment `SourceId` values that are synced to the Data Lake.
104+
105+
### IAM Policy
106+
107+
Add a policy to the role created above to give Segment access to the relevant Glue databases and tables, EMR cluster, and S3
108+
109+
```json
110+
{
111+
"Version": "2012-10-17",
112+
"Statement": [
113+
{
114+
"Action": [
115+
"elasticmapreduce:TerminateJobFlows",
116+
"elasticmapreduce:RunJobFlow",
117+
"elasticmapreduce:DescribeStep",
118+
"elasticmapreduce:DescribeCluster",
119+
"elasticmapreduce:CancelSteps",
120+
"elasticmapreduce:AddJobFlowSteps"
121+
],
122+
"Effect": "Allow",
123+
"Resource": "*",
124+
"Condition": {
125+
"StringEquals": {
126+
"elasticmapreduce:ResourceTag/vendor": "segment"
127+
}
128+
}
129+
},
130+
{
131+
"Sid": "",
132+
"Effect": "Allow",
133+
"Action": [
134+
"glue:UpdateTable",
135+
"glue:UpdatePartition",
136+
"glue:GetTables",
137+
"glue:GetTableVersions",
138+
"glue:GetTableVersion",
139+
"glue:GetTable",
140+
"glue:GetPartitions",
141+
"glue:GetPartition",
142+
"glue:DeleteTableVersion",
143+
"glue:DeleteTable",
144+
"glue:DeletePartition",
145+
"glue:CreateTable",
146+
"glue:CreatePartition",
147+
"glue:CreateDatabase",
148+
"glue:BatchGetPartition",
149+
"glue:BatchDeleteTableVersion",
150+
"glue:BatchDeleteTable",
151+
"glue:BatchDeletePartition",
152+
"glue:BatchCreatePartition"
153+
],
154+
"Resource": [
155+
"arn:aws:glue:$REGION:$YOUR_ACCOUNT:table/*",
156+
"arn:aws:glue:$REGION:$YOUR_ACCOUNT:database/default",
157+
"arn:aws:glue:$REGION:$YOUR_ACCOUNT:database/*",
158+
"arn:aws:glue:$REGION:$YOUR_ACCOUNT:catalog"
159+
]
160+
},
161+
{
162+
"Effect": "Allow",
163+
"Action": "*",
164+
"Resource": [
165+
"arn:aws:s3:::$BUCKET_NAME/*",
166+
"arn:aws:s3:::$BUCKET_NAME"
167+
]
168+
},
169+
{
170+
"Effect": "Allow",
171+
"Action": [
172+
"athena:*"
173+
],
174+
"Resource": [
175+
"*"
176+
]
177+
}
178+
]
179+
}
180+
```
181+
182+
> note ""
183+
> **NOTE:** The policy above grants full access to Athena, but the individual Glue and S3 policies decide which table can be queried. Segment queries only for debugging purposes, and will notify you be for running any queries.
184+
185+
## Debugging
186+
187+
Segment requires access to the data and schema for debugging data quality issues. The modes available for debugging are:
188+
- Access the individual objects stored in S3 and the associated schema in order to understand data discrepancies
189+
- Run an Athena query on the underlying data stored in S3
190+
- Ensure Athena uses Glue as the data catalog. Older accounts may not have this configuration, and may require some additional steps to complete the upgrade. The Glue console typically displays a warning and provides a link to instructions on how to complete the upgrade.
191+
- An easier alternative is to create a new account that has Athena backed by Glue as the default.
192+
56.6 KB
Loading
269 KB
Loading
54.9 KB
Loading

0 commit comments

Comments
 (0)