MicrosoftDocs
diff --git a/‎.openpublishing.redirection.json
Lines changed: 5 additions & 0 deletions b/‎.openpublishing.redirection.json
Lines changed: 5 additions & 0 deletions
diff --git a/‎articles/cognitive-services/personalizer/concept-active-inactive-events.md
Lines changed: 41 additions & 0 deletions b/‎articles/cognitive-services/personalizer/concept-active-inactive-events.md
Lines changed: 41 additions & 0 deletions
diff --git a/‎articles/cognitive-services/personalizer/concept-active-learning.md
Lines changed: 16 additions & 35 deletions b/‎articles/cognitive-services/personalizer/concept-active-learning.md
Lines changed: 16 additions & 35 deletions
diff --git a/‎articles/cognitive-services/personalizer/concept-rewards.md
Lines changed: 12 additions & 17 deletions b/‎articles/cognitive-services/personalizer/concept-rewards.md
Lines changed: 12 additions & 17 deletions
diff --git a/‎articles/cognitive-services/personalizer/concepts-offline-evaluation.md
Lines changed: 12 additions & 2 deletions b/‎articles/cognitive-services/personalizer/concepts-offline-evaluation.md
Lines changed: 12 additions & 2 deletions
@@ -831,6 +831,11 @@
       "redirect_url": "/azure/cognitive-services//QnAMaker/Quickstarts/get-answer-from-knowledge-base-using-url-tool",
       "redirect_document_id": false
     },
+    {
+      "source_path": "articles/cognitive-services/personalizer/how-to-learning-policy.md",
+      "redirect_url": "/azure/cognitive-services/personalizer/how-to-manage-model",
+      "redirect_document_id": false
+    },
     {
       "source_path": "articles/cognitive-services/LUIS/luis-tutorial-bot-csharp-appinsights.md",
       "redirect_url": "/azure/cognitive-services/LUIS/luis-csharp-tutorial-bf-v4",
 
@@ -0,0 +1,41 @@
+---
+title: Active and inactive events - Personalizer
+description: This article discusses the use of active and inactive events within the Personalizer service.
+ms.topic: conceptual
+ms.date: 02/20/2020
+---
+
+# Active and inactive events
+
+An **active** event is any call to Rank where you know you are going to show the result to the customer and determine the reward score. This is the default behavior.
+
+An **inactive** event is a call to Rank where you are not sure if the user will ever see the recommended action, due to business logic. This allows you to discard the event so Personalizer isn't trained with the default reward. Inactive events should not call the Reward API.
+
+It is important the that the learning loop know the actual type of event. An inactive event will not have a Reward call. An active event should have a Reward call but if the API call is never made, the default reward score is applied. Change the status of an event from inactive to active as soon as you know it will influence the user experience.
+
+## Typical active events scenario
+
+When your application calls the Rank API, you receive the action, which the application should show in the **rewardActionId** field.  From that moment, Personalizer expects a Reward call with a reward score that has the same eventId. The reward score is used to train the model for future Rank calls. If no Reward call is received for the eventId, a default reward is applied. [Default rewards](how-to-settings.md#configure-rewards-for-the-feedback-loop) are set on your Personalizer resource in the Azure portal.
+
+## Other event type scenarios
+
+In some scenarios, the application might need to call Rank before it even knows if the result will be used or displayed to the user. This might happen in situations where, for example, the page rendering of promoted content is overwritten by a marketing campaign. If the result of the Rank call was never used and the user never saw it, don't send a corresponding Reward call.
+
+Typically, these scenarios happen when:
+
+* You're prerendering UI that the user might or might not get to see.
+* Your application is doing predictive personalization in which Rank calls are made with little real-time context and the application might or might not use the output.
+
+In these cases, use Personalizer to call Rank, requesting the event to be _inactive_. Personalizer won't expect a reward for this event, and it won't apply a default reward.
+
+Later in your business logic, if the application uses the information from the Rank call, just _activate_ the event. As soon as the event is active, Personalizer expects an event reward. If no explicit call is made to the Reward API, Personalizer applies a default reward.
+
+## Inactive events
+
+To disable training for an event, call Rank by using `learningEnabled = False`.
+
+For an inactive event, learning is implicitly activated if you send a reward for the eventId or call the `activate` API for that eventId.
+
+## Next steps
+
+* Learn [how to determine reward score and what data to consider](concept-rewards.md).
@@ -1,46 +1,23 @@
 ---
-title: Active and inactive events - Personalizer
-titleSuffix: Azure Cognitive Services
-description: This article discusses the use of active and inactive events, learning settings, and learning policies within the Personalizer service.
-services: cognitive-services
-author: diberry
-manager: nitinme
-ms.service: cognitive-services
-ms.subservice: personalizer
+title: Learning policy - Personalizer
+description: Learning settings determine the *hyperparameters* of the model training. Two models of the same data that are trained on different learning settings will end up different.
 ms.topic: conceptual
-ms.date: 01/09/2019
-ms.author: diberry
+ms.date: 02/20/2020
 ---
 
-# Active and inactive events
-
-When your application calls the Rank API, you receive the action the application should show in the **rewardActionId** field.  From that moment, Personalizer expects a Reward call that has the same eventId. The reward score will be used to train the model for future Rank calls. If no Reward call is received for the eventId, a default reward is applied. Default rewards are set in the Azure portal.
-
-In some scenarios, the application might need to call Rank before it even knows if the result will be used or displayed to the user. This might happen in situations where, for example, the page rendering of promoted content is overwritten by a marketing campaign. If the result of the Rank call was never used and the user never saw it, don't send a corresponding Reward call.
-
-Typically, these scenarios happen when:
-
-* You're prerendering UI that the user might or might not get to see.
-* Your application is doing predictive personalization in which Rank calls are made with little real-time context and the application might or might not use the output.
-
-In these cases, use Personalizer to call Rank, requesting the event to be _inactive_. Personalizer won't expect a reward for this event, and it won't apply a default reward.
-Later in your business logic, if the application uses the information from the Rank call, just _activate_ the event. As soon as the event is active, Personalizer expects an event reward. If no explicit call is made to the Reward API, Personalizer applies a default reward.
-
-## Inactive events
-
-To disable training for an event, call Rank by using `learningEnabled = False`. For an inactive event, learning is implicitly activated if you send a reward for the eventId or call the `activate` API for that eventId.
-
-## Learning settings
+# Learning policy and settings
 
 Learning settings determine the *hyperparameters* of the model training. Two models of the same data that are trained on different learning settings will end up different.
 
-### Import and export learning policies
+[Learning policy and settings](how-to-settings.md#configure-rewards-for-the-feedback-loop) are set on your Personalizer resource in the Azure portal.
+
+## Import and export learning policies
 
 You can import and export learning-policy files from the Azure portal. Use this method to save existing policies, test them, replace them, and archive them in your source code control as artifacts for future reference and audit.
 
-Learn [how to](how-to-learning-policy.md) import and export a learning policy.
+Learn [how to](how-to-manage-model.md#import-a-new-learning-policy) import and export a learning policy in the Azure portal for your Personalizer resource.
 
-### Understand learning policy settings
+## Understand learning policy settings
 
 The settings in the learning policy aren't intended to be changed. Change settings only if you understand how they affect Personalizer. Without this knowledge, you could cause problems, including invalidating Personalizer models.
 
@@ -55,14 +32,18 @@ The following `.json` is an example of a learning policy.
 }
 ```
 
-### Compare learning policies
+## Compare learning policies
 
 You can compare how different learning policies perform against past data in Personalizer logs by doing [offline evaluations](concepts-offline-evaluation.md).
 
-[Upload your own learning policies](how-to-learning-policy.md) to compare them with the current learning policy.
+[Upload your own learning policies](how-to-manage-model.md) to compare them with the current learning policy.
 
-### Optimize learning policies
+## Optimize learning policies
 
 Personalizer can create an optimized learning policy in an [offline evaluation](how-to-offline-evaluation.md). An optimized learning policy that has better rewards in an offline evaluation will yield better results when it's used online in Personalizer.
 
 After you optimize a learning policy, you can apply it directly to Personalizer so it immediately replaces the current policy. Or you can save the optimized policy for further evaluation and later decide whether to discard, save, or apply it.
+
+## Next steps
+
+* Learn [active and inactive events](concept-active-inactive-events.md).
@@ -1,26 +1,21 @@
 ---
 title: Reward score - Personalizer
-titleSuffix: Azure Cognitive Services
 description: The reward score indicates how well the personalization choice, RewardActionID, resulted for the user. The value of the reward score is determined by your business logic, based on observations of user behavior. Personalizer trains its machine learning models by evaluating the rewards.
-services: cognitive-services
-author: diberry
-manager: nitinme
-ms.service: cognitive-services
-ms.subservice: personalizer
+ms.date: 02/20/2020
 ms.topic: conceptual
-ms.date: 10/24/2019
-ms.author: diberry
 ---
 
 # Reward scores indicate success of personalization
 
 The reward score indicates how well the personalization choice, [RewardActionID](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/rank/rank#response), resulted for the user. The value of the reward score is determined by your business logic, based on observations of user behavior.
 
-Personalizer trains its machine learning models by evaluating the rewards. 
+Personalizer trains its machine learning models by evaluating the rewards.
+
+Learn [how to](how-to-settings.md#configure-rewards-for-the-feedback-loop) configure the default reward score in the Azure portal for your Personalizer resource.
 
 ## Use Reward API to send reward score to Personalizer
 
-Rewards are sent to Personalizer by the [Reward API](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/events/reward). Typically, a reward is a number from 0 and 1. A negative reward, with the value of -1, is possible in certain scenarios and should only be used if you are experienced with reinforcement learning (RL). Personalizer trains the model to achieve the highest possible sum of rewards over time.
+Rewards are sent to Personalizer by the [Reward API](https://docs.microsoft.com/rest/api/cognitiveservices/personalizer/events/reward). Typically, a reward is a number from 0 to 1. A negative reward, with the value of -1, is possible in certain scenarios and should only be used if you are experienced with reinforcement learning (RL). Personalizer trains the model to achieve the highest possible sum of rewards over time.
 
 Rewards are sent after the user behavior has happened, which could be days later. The maximum amount of time Personalizer will wait until an event is considered to have no reward or a default reward is configured with the [Reward Wait Time](#reward-wait-time) in the Azure portal.
 
@@ -42,16 +37,16 @@ Consider these signals and behaviors for the context of the reward score:
 
 A Reward score must be computed in your business logic. The score can be represented as:
 
-* A single number sent once 
+* A single number sent once
 * A score sent immediately (such as 0.8) and an additional score sent later (typically 0.2).
 
 ## Default Rewards
 
 If no reward is received within the [Reward Wait Time](#reward-wait-time), the duration since the Rank call, Personalizer implicitly applies the **Default Reward** to that Rank event.
 
-## Building up rewards with multiple factors  
+## Building up rewards with multiple factors
 
-For effective personalization, you can build up the reward score based on multiple factors. 
+For effective personalization, you can build up the reward score based on multiple factors.
 
 For example, you could apply these rules for personalizing a list of video content:
 
@@ -88,8 +83,8 @@ By adding up reward scores, your final reward may be outside the expected score
 * **Consider unintended consequences**: Create reward functions that lead to responsible outcomes with [ethics and responsible use](ethics-responsible-use.md).
 
 * **Use Incremental Rewards**: Adding partial rewards for smaller user behaviors helps Personalizer to achieving better rewards. This incremental reward allows the algorithm to know it's getting closer to engaging the user in the final desired behavior.
-    * If you are showing a list of movies, if the user hovers over the first one for a while to see more information, you can determine that some user-engagement happened. The behavior can count with a reward score of 0.1. 
-    * If the user opened the page and then exited, the reward score can be 0.2. 
+    * If you are showing a list of movies, if the user hovers over the first one for a while to see more information, you can determine that some user-engagement happened. The behavior can count with a reward score of 0.1.
+    * If the user opened the page and then exited, the reward score can be 0.2.
 
 ## Reward wait time
 
@@ -101,12 +96,12 @@ If the **Reward Wait Time** expires, and there has been no reward information, a
 
 Follow these recommendations for better results.
 
-* Make the Reward Wait Time as short as you can, while leaving enough time to get user feedback. 
+* Make the Reward Wait Time as short as you can, while leaving enough time to get user feedback.
 
 * Don't choose a duration that is shorter than the time needed to get feedback. For example, if some of your rewards come in after a user has watched 1 minute of a video, the experiment length should be at least double that.
 
 ## Next steps
 
-* [Reinforcement learning](concepts-reinforcement-learning.md) 
+* [Reinforcement learning](concepts-reinforcement-learning.md)
 * [Try the Rank API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Rank/console)
 * [Try the Reward API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Reward)
@@ -8,7 +8,7 @@ manager: nitinme
 ms.service: cognitive-services
 ms.subservice: personalizer
 ms.topic: conceptual
-ms.date: 05/07/2019
+ms.date: 02/20/2020
 ms.author: diberry
 ---
 
@@ -45,6 +45,16 @@ Personalizer can use the offline evaluation process to discover a more optimal l
 
 After performing the offline evaluation, you can see the comparative effectiveness of Personalizer with that new policy compared to the current online policy. You can then apply that learning policy to make it effective immediately in Personalizer, by downloading it and uploading it in the Models and Policy panel. You can also download it for future analysis or use.
 
+Current policies included in the evaluation:
+
+| Learning settings | Purpose|
+|--|--|
+|**Online Policy**| The current Learning Policy used in Personalizer |
+|**Baseline**|The application's default (as determined by the first Action sent in Rank calls)|
+|**Random Policy**|An imaginary Rank behavior that always returns random choice of Actions from the supplied ones.|
+|**Custom Policies**|Additional Learning Policies uploaded when starting the evaluation.|
+|**Optimized Policy**|If the evaluation was started with the option to discover an optimized policy, it will also be compared, and you will be able to download it or make it the online learning policy, replacing the current one.|
+
 ## Understanding the relevance of offline evaluation results
 
 When you run an offline evaluation, it is very important to analyze _confidence bounds_ of the results. If they are wide, it means your application hasn’t received enough data for the reward estimates to be precise or significant. As the system accumulates more data, and you run offline evaluations over longer periods, the confidence intervals become narrower.
@@ -87,7 +97,7 @@ We recommend looking at feature evaluations and asking:
 
 * What other, additional, features could your application or system provide along the lines of those that are more effective?
 * What features can be removed due to low effectiveness? Low effectiveness features add _noise_ into the machine learning.
-* Are there any features that are accidentally included? Examples of these are: personally identifiable information (PII), duplicate IDs, etc.
+* Are there any features that are accidentally included? Examples of these are: user identifiable information, duplicate IDs, etc.
 * Are there any undesirable features that shouldn't be used to personalize due to regulatory or responsible use considerations? Are there features that could proxy (that is, closely mirror or correlate with) undesirable features?