Skip to content

Commit 9b8df1e

Browse files
authored
Merge pull request #104710 from diberry/diberry/personalizer-concepts
[Cogsvcs] Personalizer - concepts and how-tos
2 parents e2343b0 + 21a8989 commit 9b8df1e

20 files changed

+416
-332
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -831,6 +831,11 @@
831831
"redirect_url": "/azure/cognitive-services//QnAMaker/Quickstarts/get-answer-from-knowledge-base-using-url-tool",
832832
"redirect_document_id": false
833833
},
834+
{
835+
"source_path": "articles/cognitive-services/personalizer/how-to-learning-policy.md",
836+
"redirect_url": "/azure/cognitive-services/personalizer/how-to-manage-model",
837+
"redirect_document_id": false
838+
},
834839
{
835840
"source_path": "articles/cognitive-services/LUIS/luis-tutorial-bot-csharp-appinsights.md",
836841
"redirect_url": "/azure/cognitive-services/LUIS/luis-csharp-tutorial-bf-v4",
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: Active and inactive events - Personalizer
3+
description: This article discusses the use of active and inactive events within the Personalizer service.
4+
ms.topic: conceptual
5+
ms.date: 02/20/2020
6+
---
7+
8+
# Active and inactive events
9+
10+
An **active** event is any call to Rank where you know you are going to show the result to the customer and determine the reward score. This is the default behavior.
11+
12+
An **inactive** event is a call to Rank where you are not sure if the user will ever see the recommended action, due to business logic. This allows you to discard the event so Personalizer isn't trained with the default reward. Inactive events should not call the Reward API.
13+
14+
It is important the that the learning loop know the actual type of event. An inactive event will not have a Reward call. An active event should have a Reward call but if the API call is never made, the default reward score is applied. Change the status of an event from inactive to active as soon as you know it will influence the user experience.
15+
16+
## Typical active events scenario
17+
18+
When your application calls the Rank API, you receive the action, which the application should show in the **rewardActionId** field. From that moment, Personalizer expects a Reward call with a reward score that has the same eventId. The reward score is used to train the model for future Rank calls. If no Reward call is received for the eventId, a default reward is applied. [Default rewards](how-to-settings.md#configure-rewards-for-the-feedback-loop) are set on your Personalizer resource in the Azure portal.
19+
20+
## Other event type scenarios
21+
22+
In some scenarios, the application might need to call Rank before it even knows if the result will be used or displayed to the user. This might happen in situations where, for example, the page rendering of promoted content is overwritten by a marketing campaign. If the result of the Rank call was never used and the user never saw it, don't send a corresponding Reward call.
23+
24+
Typically, these scenarios happen when:
25+
26+
* You're prerendering UI that the user might or might not get to see.
27+
* Your application is doing predictive personalization in which Rank calls are made with little real-time context and the application might or might not use the output.
28+
29+
In these cases, use Personalizer to call Rank, requesting the event to be _inactive_. Personalizer won't expect a reward for this event, and it won't apply a default reward.
30+
31+
Later in your business logic, if the application uses the information from the Rank call, just _activate_ the event. As soon as the event is active, Personalizer expects an event reward. If no explicit call is made to the Reward API, Personalizer applies a default reward.
32+
33+
## Inactive events
34+
35+
To disable training for an event, call Rank by using `learningEnabled = False`.
36+
37+
For an inactive event, learning is implicitly activated if you send a reward for the eventId or call the `activate` API for that eventId.
38+
39+
## Next steps
40+
41+
* Learn [how to determine reward score and what data to consider](concept-rewards.md).
Lines changed: 16 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,23 @@
11
---
2-
title: Active and inactive events - Personalizer
3-
titleSuffix: Azure Cognitive Services
4-
description: This article discusses the use of active and inactive events, learning settings, and learning policies within the Personalizer service.
5-
services: cognitive-services
6-
author: diberry
7-
manager: nitinme
8-
ms.service: cognitive-services
9-
ms.subservice: personalizer
2+
title: Learning policy - Personalizer
3+
description: Learning settings determine the *hyperparameters* of the model training. Two models of the same data that are trained on different learning settings will end up different.
104
ms.topic: conceptual
11-
ms.date: 01/09/2019
12-
ms.author: diberry
5+
ms.date: 02/20/2020
136
---
147

15-
# Active and inactive events
16-
17-
When your application calls the Rank API, you receive the action the application should show in the **rewardActionId** field. From that moment, Personalizer expects a Reward call that has the same eventId. The reward score will be used to train the model for future Rank calls. If no Reward call is received for the eventId, a default reward is applied. Default rewards are set in the Azure portal.
18-
19-
In some scenarios, the application might need to call Rank before it even knows if the result will be used or displayed to the user. This might happen in situations where, for example, the page rendering of promoted content is overwritten by a marketing campaign. If the result of the Rank call was never used and the user never saw it, don't send a corresponding Reward call.
20-
21-
Typically, these scenarios happen when:
22-
23-
* You're prerendering UI that the user might or might not get to see.
24-
* Your application is doing predictive personalization in which Rank calls are made with little real-time context and the application might or might not use the output.
25-
26-
In these cases, use Personalizer to call Rank, requesting the event to be _inactive_. Personalizer won't expect a reward for this event, and it won't apply a default reward.
27-
Later in your business logic, if the application uses the information from the Rank call, just _activate_ the event. As soon as the event is active, Personalizer expects an event reward. If no explicit call is made to the Reward API, Personalizer applies a default reward.
28-
29-
## Inactive events
30-
31-
To disable training for an event, call Rank by using `learningEnabled = False`. For an inactive event, learning is implicitly activated if you send a reward for the eventId or call the `activate` API for that eventId.
32-
33-
## Learning settings
8+
# Learning policy and settings
349

3510
Learning settings determine the *hyperparameters* of the model training. Two models of the same data that are trained on different learning settings will end up different.
3611

37-
### Import and export learning policies
12+
[Learning policy and settings](how-to-settings.md#configure-rewards-for-the-feedback-loop) are set on your Personalizer resource in the Azure portal.
13+
14+
## Import and export learning policies
3815

3916
You can import and export learning-policy files from the Azure portal. Use this method to save existing policies, test them, replace them, and archive them in your source code control as artifacts for future reference and audit.
4017

41-
Learn [how to](how-to-learning-policy.md) import and export a learning policy.
18+
Learn [how to](how-to-manage-model.md#import-a-new-learning-policy) import and export a learning policy in the Azure portal for your Personalizer resource.
4219

43-
### Understand learning policy settings
20+
## Understand learning policy settings
4421

4522
The settings in the learning policy aren't intended to be changed. Change settings only if you understand how they affect Personalizer. Without this knowledge, you could cause problems, including invalidating Personalizer models.
4623

@@ -55,14 +32,18 @@ The following `.json` is an example of a learning policy.
5532
}
5633
```
5734

58-
### Compare learning policies
35+
## Compare learning policies
5936

6037
You can compare how different learning policies perform against past data in Personalizer logs by doing [offline evaluations](concepts-offline-evaluation.md).
6138

62-
[Upload your own learning policies](how-to-learning-policy.md) to compare them with the current learning policy.
39+
[Upload your own learning policies](how-to-manage-model.md) to compare them with the current learning policy.
6340

64-
### Optimize learning policies
41+
## Optimize learning policies
6542

6643
Personalizer can create an optimized learning policy in an [offline evaluation](how-to-offline-evaluation.md). An optimized learning policy that has better rewards in an offline evaluation will yield better results when it's used online in Personalizer.
6744

6845
After you optimize a learning policy, you can apply it directly to Personalizer so it immediately replaces the current policy. Or you can save the optimized policy for further evaluation and later decide whether to discard, save, or apply it.
46+
47+
## Next steps
48+
49+
* Learn [active and inactive events](concept-active-inactive-events.md).

articles/cognitive-services/personalizer/concept-rewards.md

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,21 @@
11
---
22
title: Reward score - Personalizer
3-
titleSuffix: Azure Cognitive Services
43
description: The reward score indicates how well the personalization choice, RewardActionID, resulted for the user. The value of the reward score is determined by your business logic, based on observations of user behavior. Personalizer trains its machine learning models by evaluating the rewards.
5-
services: cognitive-services
6-
author: diberry
7-
manager: nitinme
8-
ms.service: cognitive-services
9-
ms.subservice: personalizer
4+
ms.date: 02/20/2020
105
ms.topic: conceptual
11-
ms.date: 10/24/2019
12-
ms.author: diberry
136
---
147

158
# Reward scores indicate success of personalization
169

1710
The reward score indicates how well the personalization choice, [RewardActionID](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/rank/rank#response), resulted for the user. The value of the reward score is determined by your business logic, based on observations of user behavior.
1811

19-
Personalizer trains its machine learning models by evaluating the rewards.
12+
Personalizer trains its machine learning models by evaluating the rewards.
13+
14+
Learn [how to](how-to-settings.md#configure-rewards-for-the-feedback-loop) configure the default reward score in the Azure portal for your Personalizer resource.
2015

2116
## Use Reward API to send reward score to Personalizer
2217

23-
Rewards are sent to Personalizer by the [Reward API](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/events/reward). Typically, a reward is a number from 0 and 1. A negative reward, with the value of -1, is possible in certain scenarios and should only be used if you are experienced with reinforcement learning (RL). Personalizer trains the model to achieve the highest possible sum of rewards over time.
18+
Rewards are sent to Personalizer by the [Reward API](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/events/reward). Typically, a reward is a number from 0 to 1. A negative reward, with the value of -1, is possible in certain scenarios and should only be used if you are experienced with reinforcement learning (RL). Personalizer trains the model to achieve the highest possible sum of rewards over time.
2419

2520
Rewards are sent after the user behavior has happened, which could be days later. The maximum amount of time Personalizer will wait until an event is considered to have no reward or a default reward is configured with the [Reward Wait Time](#reward-wait-time) in the Azure portal.
2621

@@ -42,16 +37,16 @@ Consider these signals and behaviors for the context of the reward score:
4237

4338
A Reward score must be computed in your business logic. The score can be represented as:
4439

45-
* A single number sent once
40+
* A single number sent once
4641
* A score sent immediately (such as 0.8) and an additional score sent later (typically 0.2).
4742

4843
## Default Rewards
4944

5045
If no reward is received within the [Reward Wait Time](#reward-wait-time), the duration since the Rank call, Personalizer implicitly applies the **Default Reward** to that Rank event.
5146

52-
## Building up rewards with multiple factors
47+
## Building up rewards with multiple factors
5348

54-
For effective personalization, you can build up the reward score based on multiple factors.
49+
For effective personalization, you can build up the reward score based on multiple factors.
5550

5651
For example, you could apply these rules for personalizing a list of video content:
5752

@@ -88,8 +83,8 @@ By adding up reward scores, your final reward may be outside the expected score
8883
* **Consider unintended consequences**: Create reward functions that lead to responsible outcomes with [ethics and responsible use](ethics-responsible-use.md).
8984

9085
* **Use Incremental Rewards**: Adding partial rewards for smaller user behaviors helps Personalizer to achieving better rewards. This incremental reward allows the algorithm to know it's getting closer to engaging the user in the final desired behavior.
91-
* If you are showing a list of movies, if the user hovers over the first one for a while to see more information, you can determine that some user-engagement happened. The behavior can count with a reward score of 0.1.
92-
* If the user opened the page and then exited, the reward score can be 0.2.
86+
* If you are showing a list of movies, if the user hovers over the first one for a while to see more information, you can determine that some user-engagement happened. The behavior can count with a reward score of 0.1.
87+
* If the user opened the page and then exited, the reward score can be 0.2.
9388

9489
## Reward wait time
9590

@@ -101,12 +96,12 @@ If the **Reward Wait Time** expires, and there has been no reward information, a
10196

10297
Follow these recommendations for better results.
10398

104-
* Make the Reward Wait Time as short as you can, while leaving enough time to get user feedback.
99+
* Make the Reward Wait Time as short as you can, while leaving enough time to get user feedback.
105100

106101
* Don't choose a duration that is shorter than the time needed to get feedback. For example, if some of your rewards come in after a user has watched 1 minute of a video, the experiment length should be at least double that.
107102

108103
## Next steps
109104

110-
* [Reinforcement learning](concepts-reinforcement-learning.md)
105+
* [Reinforcement learning](concepts-reinforcement-learning.md)
111106
* [Try the Rank API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Rank/console)
112107
* [Try the Reward API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Reward)

articles/cognitive-services/personalizer/concepts-offline-evaluation.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: personalizer
1010
ms.topic: conceptual
11-
ms.date: 05/07/2019
11+
ms.date: 02/20/2020
1212
ms.author: diberry
1313
---
1414

@@ -45,6 +45,16 @@ Personalizer can use the offline evaluation process to discover a more optimal l
4545

4646
After performing the offline evaluation, you can see the comparative effectiveness of Personalizer with that new policy compared to the current online policy. You can then apply that learning policy to make it effective immediately in Personalizer, by downloading it and uploading it in the Models and Policy panel. You can also download it for future analysis or use.
4747

48+
Current policies included in the evaluation:
49+
50+
| Learning settings | Purpose|
51+
|--|--|
52+
|**Online Policy**| The current Learning Policy used in Personalizer |
53+
|**Baseline**|The application's default (as determined by the first Action sent in Rank calls)|
54+
|**Random Policy**|An imaginary Rank behavior that always returns random choice of Actions from the supplied ones.|
55+
|**Custom Policies**|Additional Learning Policies uploaded when starting the evaluation.|
56+
|**Optimized Policy**|If the evaluation was started with the option to discover an optimized policy, it will also be compared, and you will be able to download it or make it the online learning policy, replacing the current one.|
57+
4858
## Understanding the relevance of offline evaluation results
4959

5060
When you run an offline evaluation, it is very important to analyze _confidence bounds_ of the results. If they are wide, it means your application hasn’t received enough data for the reward estimates to be precise or significant. As the system accumulates more data, and you run offline evaluations over longer periods, the confidence intervals become narrower.
@@ -87,7 +97,7 @@ We recommend looking at feature evaluations and asking:
8797

8898
* What other, additional, features could your application or system provide along the lines of those that are more effective?
8999
* What features can be removed due to low effectiveness? Low effectiveness features add _noise_ into the machine learning.
90-
* Are there any features that are accidentally included? Examples of these are: personally identifiable information (PII), duplicate IDs, etc.
100+
* Are there any features that are accidentally included? Examples of these are: user identifiable information, duplicate IDs, etc.
91101
* Are there any undesirable features that shouldn't be used to personalize due to regulatory or responsible use considerations? Are there features that could proxy (that is, closely mirror or correlate with) undesirable features?
92102

93103

0 commit comments

Comments
 (0)