You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/storage/catalog/amazon-s3/index.md
+30-22Lines changed: 30 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,17 +6,21 @@ redirect_from:
6
6
hide-personas-partial: true
7
7
---
8
8
9
+
> success "Try the new AWS S3 beta destination"
10
+
> The AWS S3 destination supports IAM Roles, allowing for a more secure connection to your buckets. For more information, see the [AWS S3 destination documentation](/docs/connections/storage/catalog/aws-s3/).
11
+
9
12
## Getting Started
10
13
11
-
The Amazon S3 destination puts the raw logs of the data we're receiving into your S3 bucket, encrypted, no matter what region the bucket is in.
14
+
The Amazon S3 destination puts the raw logs of the data Segment receives into your S3 bucket, encrypted, no matter what region the bucket is in.
12
15
13
-
**Note:** The data is copied into your bucket every hour around the :40 minute mark. You may see multiple files over a period of time depending on how much data is copied.
16
+
> info ""
17
+
> Segment copies data into your bucket every hour around the :40 minute mark. You may see multiple files over a period of time depending on the amount of data Segment copies.
14
18
15
-
Keep in mind that Amazon S3 works in a different way from most of our destinations. Using a destinations selector like the [integrations object](https://segment.com/docs/connections/spec/common/#integrations) does not affect events in the case of Amazon S3.
19
+
Keep in mind that Amazon S3 works in a different way from most other destinations. Using a destinations selector like the [integrations object](/docs/connections/spec/common/#integrations) does not affect events with Amazon S3.
16
20
17
21
The diagram below illustrates how the S3 destination works.
18
22
19
-
Data from your sources is processed by the Segment Tracking API, which collects the Events in batches. When the batches reach a 100 MB, or once an hour, a process in the Segment backend uploads them to a secure Segment S3 bucket, from which they can be securely copied to your own S3 bucket.
23
+
The Segment Tracking API processes data from your sources, and collects the Events in batches. When these batches reach a 100 MB, or once per hour, a Segment initiates a process which uploads them to a secure Segment S3 bucket, from which they are securely copied to your own S3 bucket.
20
24
21
25

22
26
@@ -44,30 +48,34 @@ Data from your sources is processed by the Segment Tracking API, which collects
44
48
}
45
49
```
46
50
47
-
**Note:** the `Resource` property string **must** end with `/*`.
51
+
> info ""
52
+
> The `Resource` property string **must** end with `/*`.
48
53
49
54
Specifically, this adds the ability to `s3:PutObject` for the Segment s3-copy user for your bucket.
50
55
51
-
If you have server-side encryption enabled, see [additional required setup](https://segment.com/docs/connections/storage/catalog/amazon-s3/#encryption).
56
+
If you have server-side encryption enabled, see the [required configuration](#encryption).
52
57
53
58
You can edit your bucket policy in the [AWS management console](https://console.aws.amazon.com) by right-clicking the bucket and then selecting the "edit policy" option.
54
59
55
60
Lastly, enable the Amazon S3 destination in your Segment destination catalog, and put in your bucket name in the destination settings. It will take about an hour to start receiving data.
56
61
57
62
## Data format
58
63
59
-
Your logs will be stored as gzipped, newline-separated JSON containing the full call information. For a list of supported properties, you'll want to check out our [Spec](/docs/connections/spec/)docs.
64
+
Segment stores logs as gzipped, newline-separated JSON containing the full call information. For a list of supported properties, see the [Segment Spec](/docs/connections/spec/)documentation.
60
65
61
-
The logs themselves are binned by day, and named according to the following file format:
66
+
Segment groups logs by day, and names them using the following format:
The received-day will refer to the UTC date unix timestamp, that the files were received by our API, which makes it easy to find all calls received within a certain timeframe.
70
+
The received-day refers to the UTC date unix timestamp, that the API receives the file, which makes it easy to find all calls received within a certain timeframe.
66
71
67
72
## Encryption
68
73
74
+
This section contains information for enabling encryption on your S3 bucket.
75
+
69
76
### Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
70
-
Segment supports optional, S3-managed Server-Side Encryption, which you can disable or enable from the Destination Configuration UI. By default, the destination now automatically enables encryption, and we recommend that you continue to encrypt.
77
+
78
+
Segment supports optional, S3-managed Server-Side Encryption, which you can disable or enable from the Destination Configuration UI. By default, the destination now automatically enables encryption, and Segment recommends that you continue to encrypt.
71
79
If you've had the S3 destination enabled since before October 2017, you might need to enable encryption manually on your bucket.
72
80
73
81
While most client libraries transparently decrypt the file when fetching it, you should make sure that any applications that are consume data in the S3 bucket are ready to decrypt the data before you enable this feature. When you're ready, you can enable encryption from the setting in the destination configuration UI.
@@ -105,14 +113,14 @@ The target S3 bucket should have the "Default encryption" property enabled and s
#### Disable ServerSideEncryption in Segment S3 Destination settings
108
-
Disable the Server Side Encryption setting in the Segment destination configuration. This allows bucket-level encryption to be enabled, so Amazon can automatically encrypt objects using KMS managed keys.
116
+
Disable the Server Side Encryption setting in the Segment destination configuration. This allows you to enable bucket-level encryption, so Amazon can encrypt objects using KMS managed keys.
To further secure your bucket by ensuring that all files are uploaded with the encryption flag present, you can add to the bucket policy to strictly enforce that all uploads trigger encryption.
121
+
To further secure your bucket by ensuring that all files upload with the encryption flag present, you can add to the bucket policy to strictly enforce that all uploads trigger encryption.
114
122
115
-
We recommend doing this as a best practice. The following policy strictly enforces upload encryption with Amazon S3-Managed keys.
123
+
Segment recommends doing this as a best practice. The following policy strictly enforces upload encryption with Amazon S3-Managed keys.
116
124
117
125
```json
118
126
{
@@ -158,32 +166,32 @@ We recommend doing this as a best practice. The following policy strictly enforc
158
166
159
167
## Region
160
168
161
-
Segment automatically infers the region of your bucket when we copy data to it, so you do not need to specify a bucket region in your configuration. However, if you're using VPC Endpoints for your S3 bucket, make sure the endpoint is configured in the same region as your bucket. You can find more information on this in the AWS S3 docs [here](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints-s3.html).
169
+
Segment infers the region of your bucket when data is copied to it, so you don' need to specify a bucket region in your configuration. If you're using VPC Endpoints for your S3 bucket, make sure you configure the endpoint in the same region as your bucket. You can find more information on this in the AWS S3 docs [here](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints-s3.html).
162
170
163
171
## Custom Path Prefix
164
172
165
-
To use a custom key prefix for the files in your bucket, append the path to the bucket name in the Segment S3 destination configuration UI. For example, a bucket string `mytestbucket/path/prefix` would result in data being copied to `/path/prefix/segment-logs/{source-id}/{received-day}`/
173
+
To use a custom key prefix for the files in your bucket, append the path to the bucket name in the Segment S3 destination configuration UI. For example, a bucket string `mytestbucket/path/prefix` would result in data copying to `/path/prefix/segment-logs/{source-id}/{received-day}/`.
166
174
167
175
### How can I download the data from my bucket?
168
176
169
-
We've had the most luck using the [AWS CLI](http://aws.amazon.com/cli/) and writing a short script to download specific days, one at a time. We've found AWS CLI to be significantly faster than [s3cmd](http://s3tools.org/s3cmd) because it downloads files in parallel.
177
+
Segment recommends using the [AWS CLI](http://aws.amazon.com/cli/) and writing a short script to download specific days, one at a time. The AWS CLI is faster than [s3cmd](http://s3tools.org/s3cmd) because it downloads files in parallel.
170
178
171
-
**NOTE**: S3 transparently decompresses the files for most clients. However, to access the raw gzipped data you can programmatically download the file using [the AWS SDK](http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html) and setting `ResponseContentEncoding: none`. This functionality isn't available in the AWS CLI). You can also manually remove the metadata on the file (`Content-Type: text/plain` and `Content-Encoding: gzip`) through the AWS interface, which allows you to download the file as gzipped.
179
+
> info ""
180
+
> S3 transparently decompresses the files for most clients. To access the raw gzipped data you can programmatically download the file using [the AWS SDK](http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html) and setting `ResponseContentEncoding: none`. This functionality isn't available in the AWS CLI). You can also manually remove the metadata on the file (`Content-Type: text/plain` and `Content-Encoding: gzip`) through the AWS interface, which allows you to download the file as gzipped.
181
+
182
+
To configure the AWS CLI, see Amazon's documentation [here](http://docs.aws.amazon.com/cli/latest/userguide/installing.html). For linux systems, run the following command:
172
183
173
-
To set up AWS CLI, you'll need to first install it. There are [detailed instructions here](http://docs.aws.amazon.com/cli/latest/userguide/installing.html), or this will generally work for linux machines:
174
184
175
185
```bash
176
186
$ sudo apt-get install awscli
177
187
```
178
188
179
-
Then you'll need to configure AWS CLI with your Access Key ID and Secret Access Key. You can create or find these keys in your [Amazon IAM user management console](https://console.aws.amazon.com/iam/home#users). Then run the following command which will prompt you for the access keys:
189
+
Then configure AWS CLI with your Access Key ID and Secret Access Key. You can create or find these keys in your [Amazon IAM user management console](https://console.aws.amazon.com/iam/home#users). Then run the following command which will prompt you for the access keys:
180
190
181
191
```bash
182
192
$ aws configure
183
193
```
184
194
185
-
Now you're ready to download some logs!
186
-
187
195
To see a list of the most recent log folders:
188
196
189
197
```bash
@@ -210,7 +218,7 @@ To put the files in a specific folder replace the `.` at the end ("current direc
210
218
> warning ""
211
219
> As mentioned above, the Amazon S3 destination works differently than other destinations in Segment. As a result, Segment sends **all** data from a Personas source to S3 during the sync process, not only the connected audiences and traits.
212
220
213
-
You can send computed traits and audiences generated using [Segment Personas](/docs/personas) to this destination as a **user property**. To learn more about Personas, contact us for a [demo](https://segment.com/contact/demo).
221
+
You can send computed traits and audiences generated using [Segment Personas](/docs/personas) to this destination as a **user property**.
214
222
215
223
For user-property destinations, Segment sends an [identify](/docs/connections/spec/identify/) call to the destination for each user added and removed. The property name is the snake_cased version of the audience name, with a true/false value to indicate membership. For example, when a user first completes an order in the last 30 days, Personas sends an Identify call with the property `order_completed_last_30days: true`. When the user no longer satisfies this condition (for example, it's been more than 30 days since their last order), Personas sets that value to `false`.
0 commit comments