Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 22 additions & 19 deletions src/connections/sources/catalog/cloud-apps/amazon-s3/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,41 +5,40 @@ id: GNLT5OQ45P
---
{% include content/source-region-unsupported.md %}

This document contains a procedure that enables you to upload a CSV file containing data to Amazon S3, where it uses Lambda to automatically parse, format, and upload the data to Segment.
This document outlines how to upload a CSV file containing data to [Amazon S3](https://aws.amazon.com/s3/){:target="_blank”}, which uses [Lambda](https://aws.amazon.com/lambda/){:target="_blank”} to automatically parse, format, and upload the data to Segment.

You might have sources of data where you can't instrument Segment's SDKs, including other SaaS tools for which a Segment integration is not yet available. In many of these cases, you can extract data from these sources in CSV format, and then use Segment's server-side SDKs or HTTP tracking API to push the data to Segment.

The goal of this walkthrough is to make this process easier by providing an automated process that ingests this data. Once you complete this walkthrough, you will have the following Segment, Amazon S3, Lambda, and IAM resources deployed:
The goal of this walkthrough is to make this process easier by providing an automated process that ingests this data. Once you complete this walkthrough, you will have the following Segment, Amazon S3, Lambda, and [IAM](https://aws.amazon.com/iam/){:target="_blank"} resources deployed:

- a Segment S3 source
- an AWS Lambda function
- an access policy for the Lambda function that grants Amazon S3 permission to invoke it
- an AWS IAM execution role that grants the permissions your Lambda function needs through the permissions policy associated with this role
- an AWS S3 source bucket with a notification configuration that invokes the Lambda function


## Prerequisites

This tutorial assumes that you have some basic understanding of S3, Lambda and the `aws cli` tool. If you haven't already, follow the instructions in [Getting Started with AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html){:target="_blank"} to create your first Lambda function. If you're unfamiliar with `aws cli`, follow the instructions in [Setting up the AWS Command Line Interface](https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html){:target="_blank"} before you proceed.
This tutorial assumes that you have some basic understanding of S3, Lambda and the `aws cli` tool. If you haven't already, follow the instructions in [Getting Started with AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html){:target="_blank"} to create your first Lambda function. If you're unfamiliar with `aws cli`, follow the instructions in [Setting up the AWS Command Line Interface](https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html){:target="_blank"} before you proceed.

This tutorial uses a command line terminal or shell to run commands. Commands appear preceded by a prompt symbol (`$`) and the name of the current directory, when appropriate.

On Linux and macOS, use your preferred shell and package manager. On macOS, you can use the Terminal application. On Windows 10, you can [install the Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10){:target="_blank"} to get a Windows-integrated version of Ubuntu and Bash.

[Install NPM](https://www.npmjs.com/get-npm){:target="_blank"} to manage the function's dependencies.

## Getting Started

## Getting started
### 1. Create an S3 source in Segment

Remember the write key for this source, you'll need it in a later step.

### 2. Create the Execution Role
### 2. Create the execution role

Create the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html){:target="_blank"} that gives your function permission to access AWS resources.
Create the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html){:target="_blank"} that gives your function permission to access AWS resources.

**To create an execution role**

1. Open the [roles page](https://console.aws.amazon.com/iam/home#/roles){:target="_blank"} in the IAM console.
1. Open the [roles page](https://console.aws.amazon.com/iam/home#/roles){:target="_blank"} in the IAM console.
2. Choose **Create role**.
3. Create a role with the following properties:
- Set the **Trusted entity** to **AWS Lambda**.
Expand All @@ -53,7 +52,7 @@ Create the [execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda

The **AWSLambdaExecute** policy has the permissions that the function needs to manage objects in Amazon S3, and write logs to CloudWatch Logs.

### 3. Create Local Files, an S3 Bucket and Upload a Sample Object
### 3. Create local files, an S3 bucket and upload a sample object

Follow these steps to create your local files, S3 bucket and upload an object.

Expand All @@ -73,7 +72,7 @@ Follow these steps to create your local files, S3 bucket and upload an object.
3. Create your bucket. **Record your bucket name** - you'll need it later!
4. In the source bucket, upload `track_1.csv`.

### 4. Create the Function
### 4. Create the function

Next, create the Lambda function, install dependencies, and zip everything up so it can be deployed to AWS.

Expand Down Expand Up @@ -260,11 +259,11 @@ The command above sets a 90-second timeout value as the function configuration.
S3-Lambda-Segment$ aws lambda update-function-configuration --function-name <!Your Lambda Name!> --timeout 180
```

### 5. Test the Lambda Function
### 5. Test the lambda function

In this step, you invoke the Lambda function manually using sample Amazon S3 event data.

**To test the Lambda function**
**To test the lambda function**

1. Create an empty file named `output.txt` in the `S3-Lambda-Segment` folder - the aws cli complains if it's not there.
```bash
Expand All @@ -281,7 +280,7 @@ In this step, you invoke the Lambda function manually using sample Amazon S3 eve

**Note**: Calls to Segment's Object API don't show up the Segment debugger.

### Configure Amazon S3 to Publish Events
### Configure Amazon S3 to publish events

In this step, you add the remaining configuration so that Amazon S3 can publish object-created events to AWS Lambda and invoke your Lambda function.
You'll do the following:
Expand Down Expand Up @@ -348,11 +347,15 @@ Last, test your system to make sure it's working as expected:
### Timestamps
This script automatically transforms all CSV timestamp columns named `createdAt` and `timestamp` to timestamp objects, regardless of nesting, preparation for Segment ingestion. If your timestamps have a different name, search the example `index.js` code for the `colParser` function, and add your column names there for automatic transformation. If you make this modification, re-zip the package (using `zip -r function.zip .`) and upload the new zip to Lambda.

## CSV Formats
## CSV formats

Define your CSV file structure based on the method you want to execute.

#### Identify Structure
> warning "CSV support recommendation"
>
> Implementing a production-grade solution with this tutorial can be complex. Segment recommends that you submit feature requests for Segment reverse ETL for CSV support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sade-wusi Love this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All credit goes to Terrence!


#### Identify structure

An `identify_XXXXX` .csv file uses the following field names:

Expand All @@ -367,7 +370,7 @@ An `identify_XXXXX` .csv file uses the following field names:
In the above structure, the `userId` is required, but all other items are optional. Start all traits with `traits.` and then the trait name, for example `traits.account_type`. Similarly, start context fields with `context.` followed by the canonical structure. The same structure applies to `integrations.` too.


#### Page/Screen Structure
#### Page/Screen structure

For example a `screen_XXXXX` or `page_YYYY` file has the following field names:

Expand All @@ -380,7 +383,7 @@ For example a `screen_XXXXX` or `page_YYYY` file has the following field names:
7. `timestamp` (Unix time) - Optional
8. `integrations.<integration>` - Optional

#### Track Structure
#### Track structure

For example a `track_XXXXX` file has the following field names:

Expand Down Expand Up @@ -409,7 +412,7 @@ For any of these methods, you might need to pass nested JSON to the tracking or

The example `index.js` sample code above does not support ingestion of arrays. If you need this functionality you can modify the sample code as needed.

#### Object Structure
#### Object structure

There are cases when Segment's tracking API is not suitable for datasets that you might want to move to a warehouse. This could be e-commerce product data, media content metadata, campaign performance, and so on.

Expand Down
Loading