Skip to content

Commit 89fa298

Browse files
author
markzegarelli
authored
Merge pull request #1833 from segmentio/DOC-266_s3-IAM
2 parents 726b753 + 4f162f5 commit 89fa298

File tree

6 files changed

+199
-22
lines changed

6 files changed

+199
-22
lines changed

src/connections/storage/catalog/amazon-s3/index.md

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,21 @@ redirect_from:
66
hide-personas-partial: true
77
---
88

9+
> success "Try the new AWS S3 beta destination"
10+
> The AWS S3 destination supports IAM Roles, allowing for a more secure connection to your buckets. For more information, see the [AWS S3 destination documentation](/docs/connections/storage/catalog/aws-s3/).
11+
912
## Getting Started
1013

11-
The Amazon S3 destination puts the raw logs of the data we're receiving into your S3 bucket, encrypted, no matter what region the bucket is in.
14+
The Amazon S3 destination puts the raw logs of the data Segment receives into your S3 bucket, encrypted, no matter what region the bucket is in.
1215

13-
**Note:** The data is copied into your bucket every hour around the :40 minute mark. You may see multiple files over a period of time depending on how much data is copied.
16+
> info ""
17+
> Segment copies data into your bucket every hour around the :40 minute mark. You may see multiple files over a period of time depending on the amount of data Segment copies.
1418
15-
Keep in mind that Amazon S3 works in a different way from most of our destinations. Using a destinations selector like the [integrations object](https://segment.com/docs/connections/spec/common/#integrations) does not affect events in the case of Amazon S3.
19+
Keep in mind that Amazon S3 works in a different way from most other destinations. Using a destinations selector like the [integrations object](/docs/connections/spec/common/#integrations) does not affect events with Amazon S3.
1620

1721
The diagram below illustrates how the S3 destination works.
1822

19-
Data from your sources is processed by the Segment Tracking API, which collects the Events in batches. When the batches reach a 100 MB, or once an hour, a process in the Segment backend uploads them to a secure Segment S3 bucket, from which they can be securely copied to your own S3 bucket.
23+
The Segment Tracking API processes data from your sources, and collects the Events in batches. When these batches reach a 100 MB, or once per hour, a Segment initiates a process which uploads them to a secure Segment S3 bucket, from which they are securely copied to your own S3 bucket.
2024

2125
![](images/s3processdiagram.png)
2226

@@ -44,30 +48,34 @@ Data from your sources is processed by the Segment Tracking API, which collects
4448
}
4549
```
4650

47-
**Note:** the `Resource` property string **must** end with `/*`.
51+
> info ""
52+
> The `Resource` property string **must** end with `/*`.
4853
4954
Specifically, this adds the ability to `s3:PutObject` for the Segment s3-copy user for your bucket.
5055

51-
If you have server-side encryption enabled, see [additional required setup](https://segment.com/docs/connections/storage/catalog/amazon-s3/#encryption).
56+
If you have server-side encryption enabled, see the [required configuration](#encryption).
5257

5358
You can edit your bucket policy in the [AWS management console](https://console.aws.amazon.com) by right-clicking the bucket and then selecting the "edit policy" option.
5459

5560
Lastly, enable the Amazon S3 destination in your Segment destination catalog, and put in your bucket name in the destination settings. It will take about an hour to start receiving data.
5661

5762
## Data format
5863

59-
Your logs will be stored as gzipped, newline-separated JSON containing the full call information. For a list of supported properties, you'll want to check out our [Spec](/docs/connections/spec/) docs.
64+
Segment stores logs as gzipped, newline-separated JSON containing the full call information. For a list of supported properties, see the [Segment Spec](/docs/connections/spec/) documentation.
6065

61-
The logs themselves are binned by day, and named according to the following file format:
66+
Segment groups logs by day, and names them using the following format:
6267

6368
s3://{bucket}/segment-logs/{source-id}/{received-day}/filename.gz
6469

65-
The received-day will refer to the UTC date unix timestamp, that the files were received by our API, which makes it easy to find all calls received within a certain timeframe.
70+
The received-day refers to the UTC date unix timestamp, that the API receives the file, which makes it easy to find all calls received within a certain timeframe.
6671

6772
## Encryption
6873

74+
This section contains information for enabling encryption on your S3 bucket.
75+
6976
### Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
70-
Segment supports optional, S3-managed Server-Side Encryption, which you can disable or enable from the Destination Configuration UI. By default, the destination now automatically enables encryption, and we recommend that you continue to encrypt.
77+
78+
Segment supports optional, S3-managed Server-Side Encryption, which you can disable or enable from the Destination Configuration UI. By default, the destination now automatically enables encryption, and Segment recommends that you continue to encrypt.
7179
If you've had the S3 destination enabled since before October 2017, you might need to enable encryption manually on your bucket.
7280

7381
While most client libraries transparently decrypt the file when fetching it, you should make sure that any applications that are consume data in the S3 bucket are ready to decrypt the data before you enable this feature. When you're ready, you can enable encryption from the setting in the destination configuration UI.
@@ -105,14 +113,14 @@ The target S3 bucket should have the "Default encryption" property enabled and s
105113
![update default encryption property](images/bucket-property.png)
106114

107115
#### Disable ServerSideEncryption in Segment S3 Destination settings
108-
Disable the Server Side Encryption setting in the Segment destination configuration. This allows bucket-level encryption to be enabled, so Amazon can automatically encrypt objects using KMS managed keys.
116+
Disable the Server Side Encryption setting in the Segment destination configuration. This allows you to enable bucket-level encryption, so Amazon can encrypt objects using KMS managed keys.
109117

110118
![disable segment s3 destination property](images/disable-segment-sse.png)
111119

112120
### Enforcing encryption
113-
To further secure your bucket by ensuring that all files are uploaded with the encryption flag present, you can add to the bucket policy to strictly enforce that all uploads trigger encryption.
121+
To further secure your bucket by ensuring that all files upload with the encryption flag present, you can add to the bucket policy to strictly enforce that all uploads trigger encryption.
114122

115-
We recommend doing this as a best practice. The following policy strictly enforces upload encryption with Amazon S3-Managed keys.
123+
Segment recommends doing this as a best practice. The following policy strictly enforces upload encryption with Amazon S3-Managed keys.
116124

117125
```json
118126
{
@@ -158,32 +166,32 @@ We recommend doing this as a best practice. The following policy strictly enforc
158166

159167
## Region
160168

161-
Segment automatically infers the region of your bucket when we copy data to it, so you do not need to specify a bucket region in your configuration. However, if you're using VPC Endpoints for your S3 bucket, make sure the endpoint is configured in the same region as your bucket. You can find more information on this in the AWS S3 docs [here](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints-s3.html).
169+
Segment infers the region of your bucket when data is copied to it, so you don' need to specify a bucket region in your configuration. If you're using VPC Endpoints for your S3 bucket, make sure you configure the endpoint in the same region as your bucket. You can find more information on this in the AWS S3 docs [here](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints-s3.html).
162170

163171
## Custom Path Prefix
164172

165-
To use a custom key prefix for the files in your bucket, append the path to the bucket name in the Segment S3 destination configuration UI. For example, a bucket string `mytestbucket/path/prefix` would result in data being copied to `/path/prefix/segment-logs/{source-id}/{received-day}`/
173+
To use a custom key prefix for the files in your bucket, append the path to the bucket name in the Segment S3 destination configuration UI. For example, a bucket string `mytestbucket/path/prefix` would result in data copying to `/path/prefix/segment-logs/{source-id}/{received-day}/`.
166174

167175
### How can I download the data from my bucket?
168176

169-
We've had the most luck using the [AWS CLI](http://aws.amazon.com/cli/) and writing a short script to download specific days, one at a time. We've found AWS CLI to be significantly faster than [s3cmd](http://s3tools.org/s3cmd) because it downloads files in parallel.
177+
Segment recommends using the [AWS CLI](http://aws.amazon.com/cli/) and writing a short script to download specific days, one at a time. The AWS CLI is faster than [s3cmd](http://s3tools.org/s3cmd) because it downloads files in parallel.
170178

171-
**NOTE**: S3 transparently decompresses the files for most clients. However, to access the raw gzipped data you can programmatically download the file using [the AWS SDK](http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html) and setting `ResponseContentEncoding: none`. This functionality isn't available in the AWS CLI). You can also manually remove the metadata on the file (`Content-Type: text/plain` and `Content-Encoding: gzip`) through the AWS interface, which allows you to download the file as gzipped.
179+
> info ""
180+
> S3 transparently decompresses the files for most clients. To access the raw gzipped data you can programmatically download the file using [the AWS SDK](http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html) and setting `ResponseContentEncoding: none`. This functionality isn't available in the AWS CLI). You can also manually remove the metadata on the file (`Content-Type: text/plain` and `Content-Encoding: gzip`) through the AWS interface, which allows you to download the file as gzipped.
181+
182+
To configure the AWS CLI, see Amazon's documentation [here](http://docs.aws.amazon.com/cli/latest/userguide/installing.html). For linux systems, run the following command:
172183

173-
To set up AWS CLI, you'll need to first install it. There are [detailed instructions here](http://docs.aws.amazon.com/cli/latest/userguide/installing.html), or this will generally work for linux machines:
174184

175185
```bash
176186
$ sudo apt-get install awscli
177187
```
178188

179-
Then you'll need to configure AWS CLI with your Access Key ID and Secret Access Key. You can create or find these keys in your [Amazon IAM user management console](https://console.aws.amazon.com/iam/home#users). Then run the following command which will prompt you for the access keys:
189+
Then configure AWS CLI with your Access Key ID and Secret Access Key. You can create or find these keys in your [Amazon IAM user management console](https://console.aws.amazon.com/iam/home#users). Then run the following command which will prompt you for the access keys:
180190

181191
```bash
182192
$ aws configure
183193
```
184194

185-
Now you're ready to download some logs!
186-
187195
To see a list of the most recent log folders:
188196

189197
```bash
@@ -210,7 +218,7 @@ To put the files in a specific folder replace the `.` at the end ("current direc
210218
> warning ""
211219
> As mentioned above, the Amazon S3 destination works differently than other destinations in Segment. As a result, Segment sends **all** data from a Personas source to S3 during the sync process, not only the connected audiences and traits.
212220
213-
You can send computed traits and audiences generated using [Segment Personas](/docs/personas) to this destination as a **user property**. To learn more about Personas, contact us for a [demo](https://segment.com/contact/demo).
221+
You can send computed traits and audiences generated using [Segment Personas](/docs/personas) to this destination as a **user property**.
214222

215223
For user-property destinations, Segment sends an [identify](/docs/connections/spec/identify/) call to the destination for each user added and removed. The property name is the snake_cased version of the audience name, with a true/false value to indicate membership. For example, when a user first completes an order in the last 30 days, Personas sends an Identify call with the property `order_completed_last_30days: true`. When the user no longer satisfies this condition (for example, it's been more than 30 days since their last order), Personas sets that value to `false`.
216224

87.1 KB
Loading
151 KB
Loading
78.3 KB
Loading
64.8 KB
Loading

0 commit comments

Comments
 (0)