Skip to content

Commit 91b33c9

Browse files
authored
Merge pull request #115263 from MicrosoftDocs/release-build-cogserv-personalizer
Release build cogserv personalizer
2 parents 4e4ea55 + e656c9e commit 91b33c9

12 files changed

+274
-82
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: Apprentice mode - Personalizer
3+
description:
4+
ms.topic: conceptual
5+
ms.date: 05/01/2020
6+
---
7+
8+
# Use Apprentice mode to train Personalizer without affecting your existing application
9+
10+
Due to the nature of **real-world** Reinforcement Learning, a Personalizer model can only be trained in a production environment. When deploying a new use case, the Personalizer model is not performing efficiently because it takes time for the model to be sufficiently trained. **Apprentice mode** is a learning behavior that eases this situation and allows you to gain confidence in the model – without the developer changing any code.
11+
12+
[!INCLUDE [Important Blue Box - Apprentice mode pricing tier](./includes/important-apprentice-mode.md)]
13+
14+
## What is Apprentice mode?
15+
16+
Similar to how an apprentice learns from a master, and with experience can get better; Apprentice mode is a _behavior_ that lets Personalizer learn by observing the results obtained from existing application logic.
17+
18+
Personalizer trains by mimicking the same output as the application. As more events flow, Personalizer can _catch up_ to the existing application without impacting the existing logic and outcomes. Metrics, available from the Azure portal and the API, help you understand the performance as the model learns.
19+
20+
Once Personalizer has learned and attained a certain level of understanding, the developer can change the behavior from Apprentice mode to Online mode. At that time, Personalizer starts influencing the actions in the Rank API.
21+
22+
## Purpose of Apprentice Mode
23+
24+
Apprentice mode gives you trust in the Personalizer service and its machine learning capabilities, and provides reassurance that the service is sent information that can be learned from – without risking online traffic.
25+
26+
The two main reasons to use Apprentice mode are:
27+
28+
* Mitigating **Cold Starts**: Apprentice mode helps manage and assess the cost of a "new" model's learning time - when it is not returning the best action and not achieved a satisfactory level of effectiveness of around 75-85%.
29+
* **Validating Action and Context Features**: Features sent in actions and context may be inadequate or inaccurate - too little, too much, incorrect, or too specific to train Personalizer to attain the ideal effectiveness rate. Use [feature evaluations](concept-feature-evaluation.md) to find and fix issues with features.
30+
31+
## When should you use Apprentice mode?
32+
33+
Use Apprentice mode to train Personalizer to improve its effectiveness through the following scenarios while leaving the experience of your users unaffected by Personalizer:
34+
35+
* You are implementing Personalizer in a new use case.
36+
* You have significantly changed the features you send in Context or Actions.
37+
* You have significantly changed when and how you calculate rewards.
38+
39+
Apprentice mode is not an effective way of measuring the impact Personalizer is having on reward scores. To measure how effective Personalizer is at choosing the best possible action for each Rank call, use [Offline evaluations](concepts-offline-evaluation.md).
40+
41+
## Who should use Apprentice mode?
42+
43+
Apprentice mode is useful for developers, data scientists and business decision makers:
44+
45+
* **Developers** can use Apprentice mode to make sure the Rank and Reward APIs are being used correctly in the application, and that features being sent to Personalizer from the application contains no bugs, or non-relevant features such as a timestamp or UserID element.
46+
47+
* **Data scientists** can use Apprentice mode to validate that the features are effective to train the Personalizer models, that the reward wait times aren’t too long or short.
48+
49+
* **Business Decision Makers** can use Apprentice mode to assess the potential of Personalizer to improve results (i.e. rewards) compared to existing business logic. This allows them to make a informed decision impacting user experience, where real revenue and user satisfaction are at stake.
50+
51+
## Comparing Behaviors - Apprentice mode and Online mode
52+
53+
Learning when in Apprentice mode differs from Online mode in the following ways.
54+
55+
|Area|Apprentice mode|Online mode|
56+
|--|--|--|
57+
|Impact on User Experience|You can use existing user behavior to train Personalizer by letting it observe (not affect) what your **default action** would have been and the reward it obtained. This means your users’ experience and the business results from them won’t be impacted.|Display top action returned from Rank call to affect user behavior.|
58+
|Learning speed|Personalizer will learn more slowly when in Apprentice mode than when learning in Online mode. Apprentice mode can only learn by observing the rewards obtained by your **default action**, which limits the speed of learning, as no exploration can be performed.|Learns faster because it can both exploit the current model and explore for new trends.|
59+
|Learning effectiveness "Ceiling"|Personalizer can approximate, very rarely match, and never exceed the performance of your base business logic (the reward total achieved by the **default action** of each Rank call).|Personalizer should exceed applications baseline, and over time where it stalls you should conduct on offline evaluation and feature evaluation to continue to get improvements to the model. |
60+
|Rank API value for rewardActionId|The users' experience doesn’t get impacted, as _rewardActionId_ is always the first action you send in the Rank request. In other words, the Rank API does nothing visible for your application during Apprentice mode. Reward APIs in your application should not change how it uses the Reward API between one mode and another.|Users' experience will be changed by the _rewardActionId_ that Personalizer chooses for your application. |
61+
|Evaluations|Personalizer keeps a comparison of the reward totals that your default business logic is getting, and the reward totals Personalizer would be getting if in Online mode at that point. A comparison is available in the Azure portal for that resource|Evaluate Personalizer’s effectiveness by running [Offline evaluations](concepts-offline-evaluation.md), which let you compare the total rewards Personalizer has achieved against the potential rewards of the application’s baseline.|
62+
63+
A note about apprentice mode's effectiveness:
64+
65+
* Personalizer's effectiveness in Apprentice mode will rarely achieve near 100% of the application's baseline; and never exceed it.
66+
* Best practices would be not to try to get to 100% attainment; but a range of 75 – 85% should be targeted depending on the use case.
67+
68+
## Using Apprentice mode to train with historical data
69+
70+
If you have a significant amount of historical data, you’d like to use to train Personalizer, you can use Apprentice mode to replay the data through Personalizer.
71+
72+
Set up the Personalizer in Apprentice Mode and create a script that calls Rank with the actions and context features from the historical data. Call the Reward API based on your calculations of the records in this data. You will need approximately 50,000 historical events to see some results but 500,000 is recommended for higher confidence in the results.
73+
74+
When training from historical data, it is recommended that the data sent in (features for context and actions, their layout in the JSON used for Rank requests, and the calculation of reward in this training data set), matches the data (features and calculation of reward) available from the existing application.
75+
76+
Offline and post-facto data tends to be more incomplete and noisier and differs in format. While training from historical data is possible, the results from doing so may be inconclusive and not a good predictor of how well Personalizer will learn, especially if the features vary between past data and the existing application.
77+
78+
Typically for Personalizer, when compared to training with historical data, changing behavior to Apprentice mode and learning from an existing application is a more effective path to having an effective model, with less labor, data engineering, and cleanup work.
79+
80+
## Using Apprentice Mode versus A/B Tests
81+
82+
It is only useful to do A/B tests of Personalizer treatments once it has been validated and is learning in Online mode. In Apprentice mode, only the **default action** is used, which means all users would effectively see the control experience.
83+
84+
Even if Personalizer is just the _treatment_, the same challenge is present when validating the data is good for training Personalizer. Apprentice mode could be used instead, with 100% of traffic, and with all users getting the control (unaffected) experience.
85+
86+
Once you have a use case using Personalizer and learning online, A/B experiments allow you to do controlled cohorts and scientific comparison of results that may be more complex than the signals used for rewards. An example question an A/B test could answer is: `In a retail website, Personalizer optimizes a layout and gets more users to _check out_ earlier, but does this reduce total revenue per transaction?`
87+
88+
## Next steps
89+
90+
* Learn about [active and inactive events](concept-active-inactive-events.md)
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
title: Configure learning behavior
3+
description: Apprentice mode gives you confidence in the Personalizer service and its machine learning capabilities, and provides metrics that the service is sent information that can be learned from – without risking online traffic.
4+
ms.topic: how-to
5+
ms.date: 05/01/2020
6+
---
7+
8+
# Configure the Personalizer learning behavior
9+
10+
[Apprentice mode](concept-apprentice-mode.md) gives you trust and confidence in the Personalizer service and its machine learning capabilities, and provides assurance that the service is sent information that can be learned from – without risking online traffic.
11+
12+
[!INCLUDE [Important Blue Box - Apprentice mode pricing tier](./includes/important-apprentice-mode.md)]
13+
14+
## Configure Apprentice mode
15+
16+
1. Sign in to the [Azure portal](https://portal.azure.com), for your Personalizer resource.
17+
18+
1. On the **Configuration** page, on the **Learning behavior** tab, select **Return baseline action, learn as an apprentice** then select **Save**.
19+
20+
> [!div class="mx-imgBorder"]
21+
> ![Screenshot of configuring apprentice mode learning behavior in Azure portal](media/settings/configure-learning-behavior-azure-portal.png)
22+
23+
## Changes to the existing application
24+
25+
Your existing application shouldn't change how it currently selects actions to display or how the application determines the value, **reward** of that action. The only change to the application might be the order of the actions sent to Personalizer's Rank API. The action your application currently displays is sent as the _first action_ in the action list. The [Rank API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Rank) uses this first action to train your Personalizer model.
26+
27+
### Configure your application to call the Rank API
28+
29+
In order to add Personalizer to your application, you need to call the Rank and Reward APIs.
30+
31+
1. Add the [Rank API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Rank) call after the point in your existing application logic where you determine the list of actions and their features. The first action in the actions list needs to be the action selected by your existing logic.
32+
33+
1. Configure your code to display the action associated with the Rank API response's **Reward Action ID**.
34+
35+
### Configure your application to call Reward API
36+
37+
1. Use your existing business logic to calculate the **reward** of the displayed action. The value needs to be in the range from 0 to 1. Send this reward to Personalizer using the [Reward API](https://westus2.dev.cognitive.microsoft.com/docs/services/personalizer-api/operations/Reward). The reward value is not expected immediately and can be delayed over a time period - depending on your business logic.
38+
39+
1. If you don't return the reward within the configured **Reward wait time**, the default reward will be used instead.
40+
41+
## Evaluate Apprentice mode
42+
43+
In the Azure portal, on the **Evaluations** page for your Personalizer resource, review the **Current learning behavior performance**.
44+
45+
> [!div class="mx-imgBorder"]
46+
> ![Screenshot of reviewing evaluation of apprentice mode learning behavior in Azure portal](media/settings/evaluate-apprentice-mode.png)
47+
48+
Apprentice mode provides the following **evaluation metrics**:
49+
* **Baseline – average reward**: Average rewards of the application’s default (baseline).
50+
* **Personalizer – average reward**: Average of total rewards Personalizer would potentially have reached.
51+
* **Reward achievement ratio over most recent 1000 events**: Ratio of Baseline and Personalizer reward – normalized over the most recent 1000 events.
52+
53+
## Evaluate Apprentice mode features
54+
55+
Evaluate the features using an [offline evaluation](how-to-offline-evaluation.md).
56+
57+
## Switch behavior to Online mode
58+
59+
When you determine Personalizer is trained with an average of 75-85% rolling average, the model is ready to switch to Online mode.
60+
61+
In the Azure portal for your Personalizer resource, on the **Configuration** page, on the **Learning behavior** tab, select **Return the best action** then select **Save**.
62+
63+
You do not need to make any changes to the Rank and Reward API calls.
64+
65+
## Next steps
66+
67+
* [Manage model and learning settings](how-to-manage-model.md)

articles/cognitive-services/personalizer/how-to-settings.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Configure Personalizer
33
description: Service configuration includes how the service treats rewards, how often the service explores, how often the model is retrained, and how much data is stored.
44
ms.topic: conceptual
5-
ms.date: 02/19/2020
5+
ms.date: 04/29/2020
66
---
77

88
# Configure Personalizer learning loop
@@ -14,6 +14,23 @@ Configure the learning loop on the **Configuration** page, in the Azure portal f
1414
<a name="configure-service-settings-in-the-azure-portal"></a>
1515
<a name="configure-reward-settings-for-the-feedback-loop-based-on-use-case"></a>
1616

17+
## Planning configuration changes
18+
19+
Because some configuration changes [reset your model](#settings-that-include-resetting-the-model), you should plan your configuration changes.
20+
21+
If you plan to use [Apprentice mode](concept-apprentice-mode.md), make sure to review your Personalizer configuration before switching to Apprentice mode.
22+
23+
<a name="clear-data-for-your-learning-loop"></a>
24+
25+
## Settings that include resetting the model
26+
27+
The following actions trigger a retraining of the model using data available upto the last 2 days.
28+
29+
* Reward
30+
* Exploration
31+
32+
To [clear](how-to-manage-model.md) all your data, use the **Model and learning settings** page.
33+
1734
## Configure rewards for the feedback loop
1835

1936
Configure the service for your learning loop's use of rewards. Changes to the following values will reset the current Personalizer model and retrain it with the last 2 days of data.
@@ -61,16 +78,7 @@ After changing this value, make sure to select **Save**.
6178

6279
After changing this value, make sure to select **Save**.
6380

64-
<a name="clear-data-for-your-learning-loop"></a>
65-
66-
## Settings that include resetting the model
67-
68-
The following actions include an immediate retraining of the model with the last 2 days of data.
69-
70-
* Reward
71-
* Exploration
7281

73-
To [clear](how-to-manage-model.md) all your data, use the **Model and learning settings ** page.
7482

7583
## Next steps
7684

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
title: include file
3+
description: include file
4+
services: cognitive-services
5+
author: diberry
6+
manager: nitinme
7+
ms.service: cognitive-services
8+
ms.subservice: personalizer
9+
ms.topic: include
10+
ms.custom: include file
11+
ms.date: 04/29/2020
12+
ms.author: diberry
13+
---
14+
15+
> [!Important]
16+
> **Apprentice mode** (in Public Preview) is only available on the E0 pricing tier. Please see pricing for details. You can select the E0 tier at resource creation or upgrade to E0 from the Subscriptions tab in the Azure portal. If you are on another tier, and upgrade to E0, your existing Personalizer resources will automatically be migrated to the E0 tier.

articles/cognitive-services/personalizer/index.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ summary: Learn how to use Personalizer to allow your application to choose the b
55

66
metadata:
77
ms.topic: landing-page
8-
ms.date: 10/03/2019
8+
ms.date: 05/01/2020
99
ms.author: nitinme
1010
author: nitinme
1111
ms.service: cognitive-services
@@ -99,6 +99,8 @@ landingContent:
9999
linkLists:
100100
- linkListType: concept
101101
links:
102+
- text: Use Apprentice mode
103+
url: how-to-learning-behavior.md
102104
- text: Improve loop with offline evaluations
103105
url: concepts-offline-evaluation.md
104106
- text: Active and inactive events
64 KB
Loading
63.2 KB
Loading

0 commit comments

Comments
 (0)