You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: This article discusses the use of active and inactive events within the Personalizer service.
4
+
ms.topic: conceptual
5
+
ms.date: 02/20/2020
6
+
---
7
+
8
+
# Active and inactive events
9
+
10
+
An **active** event is any call to Rank where you know you are going to show the result to the customer and determine the reward score. This is the default behavior.
11
+
12
+
An **inactive** event is a call to Rank where you are not sure if the user will ever see the recommended action, due to business logic. This allows you to discard the event so Personalizer isn't trained with the default reward. Inactive events should not call the Reward API.
13
+
14
+
It is important the that the learning loop know the actual type of event. An inactive event will not have a Reward call. An active event should have a Reward call but if the API call is never made, the default reward score is applied. Change the status of an event from inactive to active as soon as you know it will influence the user experience.
15
+
16
+
## Typical active events scenario
17
+
18
+
When your application calls the Rank API, you receive the action, which the application should show in the **rewardActionId** field. From that moment, Personalizer expects a Reward call with a reward score that has the same eventId. The reward score is used to train the model for future Rank calls. If no Reward call is received for the eventId, a default reward is applied. [Default rewards](how-to-settings.md#configure-rewards-for-the-feedback-loop) are set on your Personalizer resource in the Azure portal.
19
+
20
+
## Other event type scenarios
21
+
22
+
In some scenarios, the application might need to call Rank before it even knows if the result will be used or displayed to the user. This might happen in situations where, for example, the page rendering of promoted content is overwritten by a marketing campaign. If the result of the Rank call was never used and the user never saw it, don't send a corresponding Reward call.
23
+
24
+
Typically, these scenarios happen when:
25
+
26
+
* You're prerendering UI that the user might or might not get to see.
27
+
* Your application is doing predictive personalization in which Rank calls are made with little real-time context and the application might or might not use the output.
28
+
29
+
In these cases, use Personalizer to call Rank, requesting the event to be _inactive_. Personalizer won't expect a reward for this event, and it won't apply a default reward.
30
+
31
+
Later in your business logic, if the application uses the information from the Rank call, just _activate_ the event. As soon as the event is active, Personalizer expects an event reward. If no explicit call is made to the Reward API, Personalizer applies a default reward.
32
+
33
+
## Inactive events
34
+
35
+
To disable training for an event, call Rank by using `learningEnabled = False`.
36
+
37
+
For an inactive event, learning is implicitly activated if you send a reward for the eventId or call the `activate` API for that eventId.
38
+
39
+
## Next steps
40
+
41
+
* Learn [how to determine reward score and what data to consider](concept-rewards.md).
description: This article discusses the use of active and inactive events, learning settings, and learning policies within the Personalizer service.
5
-
services: cognitive-services
6
-
author: diberry
7
-
manager: nitinme
8
-
ms.service: cognitive-services
9
-
ms.subservice: personalizer
2
+
title: Learning policy - Personalizer
3
+
description: Learning settings determine the *hyperparameters* of the model training. Two models of the same data that are trained on different learning settings will end up different.
10
4
ms.topic: conceptual
11
-
ms.date: 01/09/2019
12
-
ms.author: diberry
5
+
ms.date: 02/20/2020
13
6
---
14
7
15
-
# Active and inactive events
16
-
17
-
When your application calls the Rank API, you receive the action the application should show in the **rewardActionId** field. From that moment, Personalizer expects a Reward call that has the same eventId. The reward score will be used to train the model for future Rank calls. If no Reward call is received for the eventId, a default reward is applied. Default rewards are set in the Azure portal.
18
-
19
-
In some scenarios, the application might need to call Rank before it even knows if the result will be used or displayed to the user. This might happen in situations where, for example, the page rendering of promoted content is overwritten by a marketing campaign. If the result of the Rank call was never used and the user never saw it, don't send a corresponding Reward call.
20
-
21
-
Typically, these scenarios happen when:
22
-
23
-
* You're prerendering UI that the user might or might not get to see.
24
-
* Your application is doing predictive personalization in which Rank calls are made with little real-time context and the application might or might not use the output.
25
-
26
-
In these cases, use Personalizer to call Rank, requesting the event to be _inactive_. Personalizer won't expect a reward for this event, and it won't apply a default reward.
27
-
Later in your business logic, if the application uses the information from the Rank call, just _activate_ the event. As soon as the event is active, Personalizer expects an event reward. If no explicit call is made to the Reward API, Personalizer applies a default reward.
28
-
29
-
## Inactive events
30
-
31
-
To disable training for an event, call Rank by using `learningEnabled = False`. For an inactive event, learning is implicitly activated if you send a reward for the eventId or call the `activate` API for that eventId.
32
-
33
-
## Learning settings
8
+
# Learning policy and settings
34
9
35
10
Learning settings determine the *hyperparameters* of the model training. Two models of the same data that are trained on different learning settings will end up different.
36
11
37
-
### Import and export learning policies
12
+
[Learning policy and settings](how-to-settings.md#configure-rewards-for-the-feedback-loop) are set on your Personalizer resource in the Azure portal.
13
+
14
+
## Import and export learning policies
38
15
39
16
You can import and export learning-policy files from the Azure portal. Use this method to save existing policies, test them, replace them, and archive them in your source code control as artifacts for future reference and audit.
40
17
41
-
Learn [how to](how-to-learning-policy.md) import and export a learning policy.
18
+
Learn [how to](how-to-manage-model.md#import-a-new-learning-policy) import and export a learning policy in the Azure portal for your Personalizer resource.
42
19
43
-
###Understand learning policy settings
20
+
## Understand learning policy settings
44
21
45
22
The settings in the learning policy aren't intended to be changed. Change settings only if you understand how they affect Personalizer. Without this knowledge, you could cause problems, including invalidating Personalizer models.
46
23
@@ -55,14 +32,18 @@ The following `.json` is an example of a learning policy.
55
32
}
56
33
```
57
34
58
-
###Compare learning policies
35
+
## Compare learning policies
59
36
60
37
You can compare how different learning policies perform against past data in Personalizer logs by doing [offline evaluations](concepts-offline-evaluation.md).
61
38
62
-
[Upload your own learning policies](how-to-learning-policy.md) to compare them with the current learning policy.
39
+
[Upload your own learning policies](how-to-manage-model.md) to compare them with the current learning policy.
63
40
64
-
###Optimize learning policies
41
+
## Optimize learning policies
65
42
66
43
Personalizer can create an optimized learning policy in an [offline evaluation](how-to-offline-evaluation.md). An optimized learning policy that has better rewards in an offline evaluation will yield better results when it's used online in Personalizer.
67
44
68
45
After you optimize a learning policy, you can apply it directly to Personalizer so it immediately replaces the current policy. Or you can save the optimized policy for further evaluation and later decide whether to discard, save, or apply it.
46
+
47
+
## Next steps
48
+
49
+
* Learn [active and inactive events](concept-active-inactive-events.md).
Copy file name to clipboardExpand all lines: articles/cognitive-services/personalizer/concept-rewards.md
+12-17Lines changed: 12 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,26 +1,21 @@
1
1
---
2
2
title: Reward score - Personalizer
3
-
titleSuffix: Azure Cognitive Services
4
3
description: The reward score indicates how well the personalization choice, RewardActionID, resulted for the user. The value of the reward score is determined by your business logic, based on observations of user behavior. Personalizer trains its machine learning models by evaluating the rewards.
5
-
services: cognitive-services
6
-
author: diberry
7
-
manager: nitinme
8
-
ms.service: cognitive-services
9
-
ms.subservice: personalizer
4
+
ms.date: 02/20/2020
10
5
ms.topic: conceptual
11
-
ms.date: 10/24/2019
12
-
ms.author: diberry
13
6
---
14
7
15
8
# Reward scores indicate success of personalization
16
9
17
10
The reward score indicates how well the personalization choice, [RewardActionID](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/rank/rank#response), resulted for the user. The value of the reward score is determined by your business logic, based on observations of user behavior.
18
11
19
-
Personalizer trains its machine learning models by evaluating the rewards.
12
+
Personalizer trains its machine learning models by evaluating the rewards.
13
+
14
+
Learn [how to](how-to-settings.md#configure-rewards-for-the-feedback-loop) configure the default reward score in the Azure portal for your Personalizer resource.
20
15
21
16
## Use Reward API to send reward score to Personalizer
22
17
23
-
Rewards are sent to Personalizer by the [Reward API](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/events/reward). Typically, a reward is a number from 0 and 1. A negative reward, with the value of -1, is possible in certain scenarios and should only be used if you are experienced with reinforcement learning (RL). Personalizer trains the model to achieve the highest possible sum of rewards over time.
18
+
Rewards are sent to Personalizer by the [Reward API](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/events/reward). Typically, a reward is a number from 0 to 1. A negative reward, with the value of -1, is possible in certain scenarios and should only be used if you are experienced with reinforcement learning (RL). Personalizer trains the model to achieve the highest possible sum of rewards over time.
24
19
25
20
Rewards are sent after the user behavior has happened, which could be days later. The maximum amount of time Personalizer will wait until an event is considered to have no reward or a default reward is configured with the [Reward Wait Time](#reward-wait-time) in the Azure portal.
26
21
@@ -42,16 +37,16 @@ Consider these signals and behaviors for the context of the reward score:
42
37
43
38
A Reward score must be computed in your business logic. The score can be represented as:
44
39
45
-
* A single number sent once
40
+
* A single number sent once
46
41
* A score sent immediately (such as 0.8) and an additional score sent later (typically 0.2).
47
42
48
43
## Default Rewards
49
44
50
45
If no reward is received within the [Reward Wait Time](#reward-wait-time), the duration since the Rank call, Personalizer implicitly applies the **Default Reward** to that Rank event.
51
46
52
-
## Building up rewards with multiple factors
47
+
## Building up rewards with multiple factors
53
48
54
-
For effective personalization, you can build up the reward score based on multiple factors.
49
+
For effective personalization, you can build up the reward score based on multiple factors.
55
50
56
51
For example, you could apply these rules for personalizing a list of video content:
57
52
@@ -88,8 +83,8 @@ By adding up reward scores, your final reward may be outside the expected score
88
83
***Consider unintended consequences**: Create reward functions that lead to responsible outcomes with [ethics and responsible use](ethics-responsible-use.md).
89
84
90
85
***Use Incremental Rewards**: Adding partial rewards for smaller user behaviors helps Personalizer to achieving better rewards. This incremental reward allows the algorithm to know it's getting closer to engaging the user in the final desired behavior.
91
-
* If you are showing a list of movies, if the user hovers over the first one for a while to see more information, you can determine that some user-engagement happened. The behavior can count with a reward score of 0.1.
92
-
* If the user opened the page and then exited, the reward score can be 0.2.
86
+
* If you are showing a list of movies, if the user hovers over the first one for a while to see more information, you can determine that some user-engagement happened. The behavior can count with a reward score of 0.1.
87
+
* If the user opened the page and then exited, the reward score can be 0.2.
93
88
94
89
## Reward wait time
95
90
@@ -101,12 +96,12 @@ If the **Reward Wait Time** expires, and there has been no reward information, a
101
96
102
97
Follow these recommendations for better results.
103
98
104
-
* Make the Reward Wait Time as short as you can, while leaving enough time to get user feedback.
99
+
* Make the Reward Wait Time as short as you can, while leaving enough time to get user feedback.
105
100
106
101
* Don't choose a duration that is shorter than the time needed to get feedback. For example, if some of your rewards come in after a user has watched 1 minute of a video, the experiment length should be at least double that.
Copy file name to clipboardExpand all lines: articles/cognitive-services/personalizer/concepts-offline-evaluation.md
+12-2Lines changed: 12 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: personalizer
10
10
ms.topic: conceptual
11
-
ms.date: 05/07/2019
11
+
ms.date: 02/20/2020
12
12
ms.author: diberry
13
13
---
14
14
@@ -45,6 +45,16 @@ Personalizer can use the offline evaluation process to discover a more optimal l
45
45
46
46
After performing the offline evaluation, you can see the comparative effectiveness of Personalizer with that new policy compared to the current online policy. You can then apply that learning policy to make it effective immediately in Personalizer, by downloading it and uploading it in the Models and Policy panel. You can also download it for future analysis or use.
47
47
48
+
Current policies included in the evaluation:
49
+
50
+
| Learning settings | Purpose|
51
+
|--|--|
52
+
|**Online Policy**| The current Learning Policy used in Personalizer |
53
+
|**Baseline**|The application's default (as determined by the first Action sent in Rank calls)|
54
+
|**Random Policy**|An imaginary Rank behavior that always returns random choice of Actions from the supplied ones.|
55
+
|**Custom Policies**|Additional Learning Policies uploaded when starting the evaluation.|
56
+
|**Optimized Policy**|If the evaluation was started with the option to discover an optimized policy, it will also be compared, and you will be able to download it or make it the online learning policy, replacing the current one.|
57
+
48
58
## Understanding the relevance of offline evaluation results
49
59
50
60
When you run an offline evaluation, it is very important to analyze _confidence bounds_ of the results. If they are wide, it means your application hasn’t received enough data for the reward estimates to be precise or significant. As the system accumulates more data, and you run offline evaluations over longer periods, the confidence intervals become narrower.
@@ -87,7 +97,7 @@ We recommend looking at feature evaluations and asking:
87
97
88
98
* What other, additional, features could your application or system provide along the lines of those that are more effective?
89
99
* What features can be removed due to low effectiveness? Low effectiveness features add _noise_ into the machine learning.
90
-
* Are there any features that are accidentally included? Examples of these are: personally identifiable information (PII), duplicate IDs, etc.
100
+
* Are there any features that are accidentally included? Examples of these are: user identifiable information, duplicate IDs, etc.
91
101
* Are there any undesirable features that shouldn't be used to personalize due to regulatory or responsible use considerations? Are there features that could proxy (that is, closely mirror or correlate with) undesirable features?
0 commit comments