Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.

Commit 187dccb

Browse files
authored
Merge pull request #5 from aws-solutions/develop
Update README.md
2 parents 5dc5f0f + fcac514 commit 187dccb

File tree

1 file changed

+228
-2
lines changed

1 file changed

+228
-2
lines changed

README.md

Lines changed: 228 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,233 @@ the AWS Well-Architected Framework. This solution uses the following AWS CDK Sol
5252

5353
You can launch this solution with one click from [AWS Solutions Implementations](https://aws.amazon.com/solutions/implementations/maintaining-personalized-experiences-with-ml).
5454

55-
To customize the solution, or to contribute to the solution, follow the steps below:
55+
To customize the solution, or to contribute to the solution, see [Creating a custom build](#creating-a-custom-build)
56+
57+
## Configuration
58+
59+
This solution uses **parameter files**. The parameter file contains all the necessary information to create and maintain
60+
your resources in Amazon Personalize.
61+
62+
The file can contain the following sections
63+
- `datasetGroup`
64+
- `datasets`
65+
- `solutions` (can contain `campaigns` and `batchInferenceJobs`)
66+
- `eventTracker`
67+
- `filters`
68+
69+
<details>
70+
<summary>See a sample of the parameter file</summary>
71+
72+
```json
73+
{
74+
"datasetGroup": {
75+
"serviceConfig": {
76+
"name": "dataset-group-name"
77+
},
78+
"workflowConfig": {
79+
"schedules": {
80+
"import": "cron(0 */6 * * ? *)"
81+
}
82+
}
83+
},
84+
"datasets": {
85+
"users": {
86+
"dataset": {
87+
"serviceConfig": {
88+
"name": "users-data"
89+
}
90+
},
91+
"schema": {
92+
"serviceConfig": {
93+
"name": "users-schema",
94+
"schema": {
95+
"type": "record",
96+
"name": "users",
97+
"namespace": "com.amazonaws.personalize.schema",
98+
"fields": [
99+
{
100+
"name": "USER_ID",
101+
"type": "string"
102+
},
103+
{
104+
"name": "AGE",
105+
"type": "int"
106+
},
107+
{
108+
"name": "GENDER",
109+
"type": "string",
110+
"categorical": true
111+
}
112+
]
113+
}
114+
}
115+
}
116+
},
117+
"interactions": {
118+
"dataset": {
119+
"serviceConfig": {
120+
"name": "interactions-data"
121+
}
122+
},
123+
"schema": {
124+
"serviceConfig": {
125+
"name": "interactions-schema",
126+
"schema": {
127+
"type": "record",
128+
"name": "interactions",
129+
"namespace": "com.amazonaws.personalize.schema",
130+
"fields": [
131+
{
132+
"name": "ITEM_ID",
133+
"type": "string"
134+
},
135+
{
136+
"name": "USER_ID",
137+
"type": "string"
138+
},
139+
{
140+
"name": "TIMESTAMP",
141+
"type": "long"
142+
},
143+
{
144+
"name": "EVENT_TYPE",
145+
"type": "string"
146+
},
147+
{
148+
"name": "EVENT_VALUE",
149+
"type": "float"
150+
}
151+
]
152+
}
153+
}
154+
}
155+
}
156+
},
157+
"solutions": [
158+
{
159+
"serviceConfig": {
160+
"name": "sims-solution",
161+
"recipeArn": "arn:aws:personalize:::recipe/aws-sims"
162+
},
163+
"workflowConfig": {
164+
"schedules": {
165+
"full": "cron(0 0 ? * 1 *)"
166+
}
167+
}
168+
},
169+
{
170+
"serviceConfig": {
171+
"name": "popularity-count-solution",
172+
"recipeArn": "arn:aws:personalize:::recipe/aws-popularity-count"
173+
},
174+
"workflowConfig": {
175+
"schedules": {
176+
"full": "cron(0 1 ? * 1 *)"
177+
}
178+
}
179+
},
180+
{
181+
"serviceConfig": {
182+
"name": "user-personalization-solution",
183+
"recipeArn": "arn:aws:personalize:::recipe/aws-user-personalization"
184+
},
185+
"workflowConfig": {
186+
"schedules": {
187+
"full": "cron(0 2 ? * 1 *)"
188+
}
189+
},
190+
"campaigns": [
191+
{
192+
"serviceConfig": {
193+
"name": "user-personalization-campaign",
194+
"minProvisionedTPS": 1
195+
}
196+
}
197+
],
198+
"batchInferenceJobs": [
199+
{
200+
"serviceConfig": {},
201+
"workflowConfig": {
202+
"schedule": "cron(0 3 * * ? *)"
203+
}
204+
}
205+
]
206+
}
207+
],
208+
"eventTracker": {
209+
"serviceConfig": {
210+
"name": "dataset-group-name-event-tracker"
211+
}
212+
},
213+
"filters": [
214+
{
215+
"serviceConfig": {
216+
"name": "clicked-or-streamed",
217+
"filterExpression": "INCLUDE ItemID WHERE Interactions.EVENT_TYPE in (\"click\", \"stream\")"
218+
}
219+
},
220+
{
221+
"serviceConfig": {
222+
"name": "interacted",
223+
"filterExpression": "INCLUDE ItemID WHERE Interactions.EVENT_TYPE in (\"*\")"
224+
}
225+
}
226+
]
227+
}
228+
```
229+
230+
</details>
231+
232+
This solution allows you to manage multiple dataset groups through the use of multiple parameter files. All .json files
233+
discovered under the `train/` prefix will trigger the workflow however, the following structure is recommended:
234+
235+
```
236+
train/
237+
238+
├── <dataset_group_1>/ (option 1 - single csv files for data import)
239+
│ ├── config.json
240+
│ ├── interactions.csv
241+
│ ├── items.csv (optional)
242+
│ └── users.csv (optional)
243+
244+
└── <dataset_group_2>/ (option 2 - multiple csv files for data import)
245+
├── config.json
246+
├── interactions/
247+
│ ├── <interactions_part_1>.csv
248+
│ ├── <interactions_part_2>.csv
249+
│ └── <interactions_part_n>.csv
250+
├── users/ (optional)
251+
│ ├── <users_part_1>.csv
252+
│ ├── <users_part_2>.csv
253+
│ └── <users_part_n>.csv
254+
└── items/ (optional)
255+
├── <items_part_1>.csv
256+
├── <items_part_2>.csv
257+
└── <items_part_n>.csv
258+
```
259+
260+
If batch inference jobs are required, [batch inference job configuration files](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html#batch-data-upload)
261+
must also be uploaded to the following lcoation:
262+
263+
```
264+
batch/
265+
266+
└── <dataset_group_name>/
267+
└── <solution_name>/
268+
└── job_config.json
269+
```
270+
271+
Batch inference output will be produced at the following location:
272+
273+
```
274+
batch/
275+
276+
└── <dataset_group_name>/
277+
└── <solution_name>/
278+
└── <solution_name_YYYY_MM_DD_HH_MM_SS>/
279+
├── _CHECK
280+
└── job_config.json.out
281+
```
56282

57283
## Creating a custom build
58284
To customize the solution, follow the steps below:
@@ -134,7 +360,7 @@ build-s3-cdk-dist \
134360
S3 bucket where the name is `<DIST_BUCKET_PREFIX>-<REGION_NAME>`. The solution's CloudFormation template will expect the
135361
source code to be located in the bucket matching that name.
136362
- `$SOLUTION_NAME` - The name of This solution (example: personalize-solution-customization)
137-
- `$VERSION` - The version number to use (example: v1.0.0)
363+
- `$VERSION` - The version number to use (example: v1.0.1)
138364
- `$REGION_NAME` - The region name to use (example: us-east-1)
139365

140366
This will result in all global assets being pushed to the `DIST_BUCKET_PREFIX`, and all regional assets being pushed to

0 commit comments

Comments
 (0)