You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/destinations/catalog/amazon-personalize/index.md
+26-38Lines changed: 26 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,16 +17,16 @@ Developing the machine-learning capabilities necessary to produce these recommen
17
17
18
18
These are the pre-requisites you need before getting started:
19
19
20
-
1. Segment data flowing into an S3 destination OR a warehouse
20
+
1. Segment data flowing into an S3 destination, a Snowflake warehouse, or Amazon Redshift warehouse.
21
21
2. You have the ability to create AWS Glue jobs (only required if using S3 to [train your model](#train-your-model))
22
22
3. You have the ability to deploy Lambda functions in Amazon Web Services
23
23
4. You have access to AWS Personalize
24
24
25
-
If you don't have S3, Redshift warehouse, or Snowflake warehouse configured, you can read more about setting up [S3](/docs/connections/storage/catalog/amazon-s3/), [Redshift](/docs/connections/storage/catalog/redshift/), and [Snowflake](/docs/connections/storage/catalog/snowflake/).
25
+
If you don't have S3, Redshift warehouse, or Snowflake warehouse configured, you can read more about setting up [S3](/docs/connections/storage/catalog/aws-s3/), [Redshift](/docs/connections/storage/catalog/redshift/), and [Snowflake](/docs/connections/storage/catalog/snowflake/).
26
26
27
27
***If you're a Segment business tier customer, contact your Success contact to initiate a replay to S3 or your Warehouse.***
28
28
29
-
There are 3 main parts to using Amazon Personalize with Segment:
29
+
There are three main parts to using Amazon Personalize with Segment:
30
30
31
31
1.[Train your model](/docs/connections/destinations/catalog/amazon-personalize/#train-your-model) on historical data in S3 or a Warehouse.
32
32
2.[Create a Personalize Dataset Group](/docs/connections/destinations/catalog/amazon-personalize/#create-personalize-dataset-group-solution-and-campaign) and Campaign
@@ -135,7 +135,7 @@ DELIMITER AS ','
135
135
PARALLEL OFF;
136
136
```
137
137
138
-
Note: Use `date_part(epoch,"timestamp") as TIMESTAMP` because Personalize requires timestamps to be specified in UNIX/epoch time.
138
+
**Note:** Use `date_part(epoch,"timestamp") as TIMESTAMP` because Personalize requires timestamps to be specified in UNIX/epoch time.
139
139
140
140
**Verify the Output file**
141
141
Browse to the S3 service page in the AWS console and navigate to the bucket path specified in the `unload` command. You should see the output file.
@@ -201,15 +201,7 @@ The following examples show how to configure an AWS Glue job to convert Segment
201
201
**Create AWS Glue ETL Job**
202
202
203
203
To create an AWS Glue ETL Job:
204
-
1. Navigate to the Glue service in your AWS console.
205
-
2. Click **Get started** and then click **Jobs** in the left navigation on the Glue console page.
206
-
207
-

208
-
209
-
210
-
3. Click **Add job**.
211
-
4. Enter a job name such as "SegmentEventsJsonToCsv".
212
-
5. For IAM role, create a role and execution policies that gives your Glue job the ability to write to your S3 bucket. For example:
204
+
1. Create a new AWS service IAM role using the following execution policies. These policies give your Glue job the ability to write to your S3 bucket:
213
205
* Policy 1:
214
206
215
207
```json
@@ -326,13 +318,10 @@ To create an AWS Glue ETL Job:
326
318
]
327
319
}
328
320
```
329
-
330
-
6. Leave Type as **Spark**.
331
-
7. For **This job runs**, select **A new script to be authored by you**.
332
-
8. Leave everything else the same and click **Next** at the bottom of the form.
333
-
9. On the **Connections** step, click **Save job and edit script** since you won't access data in a database for this job.
334
-
335
-
10. The source code for a generic Glue job is below. Modify this code to reflect the names of the events you wish to extract from the Segment logs (see line #25). Copy the code example to your clipboard and paste it into the Glue editor window.
321
+
1. Navigate to the Glue service in your AWS console.
322
+
2. Click **Get started** and then select **Jobs** from the left navigation on the Glue console page.
323
+
3. Select **Spark script editor** and click **Create**.
324
+
4. The following code sample is the source code for a generic Glue job. Copy the code example to your clipboard and paste it into the Glue editor window, modifying as necessary to reflect the names of the events you wish to extract from the Segment logs (see line #25).
336
325
337
326
```python
338
327
import sys
@@ -354,7 +343,7 @@ To create an AWS Glue ETL Job:
print("Input file total record count: ", datasource0.count())
360
349
@@ -399,13 +388,12 @@ To create an AWS Glue ETL Job:
399
388
400
389
job.commit()
401
390
```
391
+
5. Select the **Job details** tab.
392
+
6. Enter a name for your Glue job.
393
+
6. Leave Type as **Spark**.
394
+
7. Make any optional changes on the Job details page, and click **Save** to save the job script.
402
395
403
-
11. Click **Save** to save the job script.
404
-
405
-

406
-
407
-
408
-
To review key parts of the script in more detail:
396
+
To review key parts of the Python script in more detail:
409
397
1. The script is initialized with a few job parameters. You'll see how to specify these parameter values when the job below runs. For now, see that Segment is passing in the location of the raw JSON files using `S3_JSON_INPUT_PATH` and the location where the output CSV should be written through `S3_CSV_OUTPUT_PATH`.
410
398
411
399
```python
@@ -425,7 +413,7 @@ To review key parts of the script in more detail:
425
413
3. The first step in Segment's Job is to load the raw JSON file as a Glue DynamicFrame.
4. Since not all events that are written to S3 by Segment are relevant to training a Personalize model, Segment uses Glue's `Filter` transformation to keep the records needed.
@@ -489,7 +477,7 @@ With Segment's ETL Job script created and saved, it's time to run the job to cre
489
477
490
478
491
479
4. Scroll down to the **Job parameters** section. This is where Segment will specify the job parameters that Segment's script expects for the path to the input data and the path to the output file.
492
-
5. Create 2 job parameters with the following key and value.
480
+
5. Create two job parameters with the following key and value.
493
481
* Be sure to prefix each key with `--` as shown. Substitute your account ID for `[ACCOUNT_ID]` in the values below. You copy the bucket name to your clipboard from the S3 service page in the tab/window you opened above. The order they are specified does not matter.
494
482
495
483
| **Key** | **Value** |
@@ -584,7 +572,7 @@ To create a personalize dataset group:
584
572
585
573
10. Click **Next** to save the schema and move to the next step.
586
574
587
-
11. The **Import user-item interaction data** step is displayed next. To complete this form Segment needs to get 2 pieces of information from IAM and S3. Give your import job a name and set the automatic import to **Off**.
575
+
11. The **Import user-item interaction data** step is displayed next. To complete this form Segment needs to get two pieces of information from IAM and S3. Give your import job a name and set the automatic import to **Off**.
588
576
589
577
12. For the **IAM service role**, select **Create a new role** from the dropdown.
590
578
13. In the next pop-up, Segment recommends listing your bucket name in the **Specific S3 buckets** option, but you're free to choose the option that best suits your needs.
@@ -601,9 +589,9 @@ Be patient as this process can take a long time to complete.
601
589
602
590
### Create Personalize Solution
603
591
604
-
Once Segment's event CSV is finished importing into a user-item interaction dataset, Segment can create a Personalize Solution. To do thi:
592
+
Once Segment's event CSV is finished importing into a user-item interaction dataset, Segment can create a Personalize Solution. To do this:
605
593
606
-
1. From the Dashboard page for the dataset group we created above, click **Start** in the **Create solutions** column.
594
+
1. From the Dashboard page for the dataset group created above, click **Start** in the **Create solutions** column.
607
595
608
596

609
597
@@ -621,7 +609,7 @@ Once Segment's event CSV is finished importing into a user-item interaction data
621
609
622
610
A deployed solution is known as a campaign, and is able to make recommendations for your users. To deploy a solution, you create a campaign in the console or by calling the CreateCampaign API. You can choose which version of the solution to use. By default, a campaign uses the latest version of a solution.
623
611
624
-
To create a Personlize campaign:
612
+
To create a Personalize campaign:
625
613
626
614
1. From the Dataset Group Dashboard, click **Create new campaign**.
627
615
@@ -694,7 +682,7 @@ To create an IAM role:
694
682
> note ""
695
683
> **NOTE:** Your Source ID can be found by navigating to **Settings > API Keys** from your Segment source homepage.
696
684
>
697
-
> For security purposes, Segment will set your Workspace ID as your External ID. If you are currently using an External ID different from your Workspace ID, reach out to our support team so they can change it and make your account more secure.
685
+
> For security purposes, Segment will set your Workspace ID as your External ID. If you are currently using an External ID different from your Workspace ID, reach out to Segment support so they can change it and make your account more secure.
698
686
699
687
```json
700
688
{
@@ -811,7 +799,7 @@ To install Segment's Layer:
811
799
812
800
**Update your IAM role for your Lambda to call Personalize**
813
801
814
-
You need to modify the IAM Role & Policy originally created with this Lambda to allow it to send and recieve data from Personalize. To do this:
802
+
You need to modify the IAM Role & Policy originally created with this Lambda to allow it to send and receive data from Personalize. To do this:
815
803
816
804
1. From the **Execution role** section of your Lambda function, click the **View the <your-role-name>** link.
817
805
@@ -972,7 +960,7 @@ You need to create a Personalize Event Tracker for the Dataset Group you created
972
960

973
961
974
962
975
-
12. Return to our Lambda function and scroll down to the **Environment variables** panel.
963
+
12. Return to your Lambda function and scroll down to the **Environment variables** panel.
976
964
977
965
13. Add an environment variable with the key `personalize_campaign_arn` and value of the Campaign ARN in your clipboard.
978
966
14. Scroll to the top of the page and click **Save** to save your changes.
@@ -1008,7 +996,7 @@ Segment allows you to send each call type to a different Lambda. If you leave th
1008
996
1009
997
There are two settings relevant for track calls:
1010
998
1011
-
1. Lambda for track calls - the Lambda where we should route track calls.
999
+
1. Lambda for track calls - the Lambda where the Segment app should route track calls.
1012
1000
2. Events - a list of specific events to send. You may send *all* track events (see setting details for instructions on how), but use caution with this option, as it may significantly increase your Lambda costs.
1013
1001
1014
1002
@@ -1020,4 +1008,4 @@ This setting controls the [Log Type](https://docs.aws.amazon.com/lambda/latest/d
1020
1008
1021
1009
**My Lambda <> Segment connection is timing out, what do I do?**
1022
1010
1023
-
Due to how our event delivery system, [Centrifuge](https://segment.com/blog/introducing-centrifuge/), works, your Lambda can't take more than 5 seconds to run per message. If you're consistently running into timeout issues, you should consult the [AWS Lambda docs](https://docs.aws.amazon.com/lambda/index.html#lang/en_us), as well as docs for your language of choice, for tips on optimizing performance.
1011
+
Due to how Segment's event delivery system, [Centrifuge](https://segment.com/blog/introducing-centrifuge/), works, your Lambda can't take more than five seconds to run per message. If you're consistently running into timeout issues, you should consult the [AWS Lambda docs](https://docs.aws.amazon.com/lambda/index.html#lang/en_us), as well as docs for your language of choice, for tips on optimizing performance.
0 commit comments