You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Document state | DRAFT, version 0.6. Current state: <ul><li>PM ART</li><li>A few open questions</li><li>Missing coordinating code samples</li></ul>
21
+
22
+
Voice assistants developed on Windows 10 must implement the user experience guidelines below in order to provide the best possible experiences for voice activation on Windows 10. This document will guide developers through understanding the key work needed for a voice assistant to integrate with the Windows 10 Shell.
23
+
24
+
## Contents
25
+
26
+
-[Summary of voice activation views supported in Windows 10](Summary-of-voice-activation-views-supported-in-Windows-10)
27
+
-[Requirements summary](Requirements-summary)
28
+
-[Best practices for good listening experiences](Best-practices-for-good-listening-experiences)
29
+
-[Design guidance for in-app voice activation](Design-guidance-for-in-app-voice-activation)
30
+
-[Design guidance for voice activation above lock](Design-guidance-for-voice-activation-above-lock)
31
+
-[Design guidance for voice activation preview](Design-guidance-for-voice-activation-preview)
32
+
33
+
## Summary of voice activation views supported in Windows 10
34
+
35
+
Windows 10 infers an activation experience for the customer context based on the device context. The following summary table is a high-level overview of the different views available when the screen is on.
36
+
37
+
| View (Availability) | Device context | Customer goal when using voice activation | Appears when | Design needs |
38
+
| --- | --- | --- | --- | --- |
39
+
|**In-app (19H1)**| Below lock, assistant has focus | Interact with the assistant app | Assistant processes the request in-app | Main in-app view listening experience |
40
+
|**Above lock (19H2)**| Above lock, unauthenticated | Interact with the assistant, but from a distance | System is locked and assistant requests activation | Full-screen visuals for far-field UI. Implement dismissal policies to not block unlock. |
41
+
|**Voice activation preview (20H1)**| Below lock, assistant does not have focus | Interact with the assistant, but in a less intrusive way | System is below lock and assistant requests background activation | Minimal canvas. Resize or hand-off to the main app view as needed. |
42
+
43
+
## Requirements summary
44
+
45
+
Minimal effort is required to access the different experiences. However, assistants do need to implement the right design guidance for each view. This table below provides a checklist of the requirements that must be followed.
|**In-app**| <ul><li>Process the request in-app</li><li>Provides UI indicators for listening states</li><li>UI adapts as window sizes change</li></ul> |
50
+
|**Above lock**| <ul><li>Detect lock state and request activation</li><li>Do not provide always persistent UX which would block access to the Windows lock screen</li><li>Provide full screen visuals and a voice-first experience</li><li>Honor dismissal guidance below</li><li>Follow privacy and security considerations below</li></ul> |
51
+
|**Voice activation preview**| <ul><li>Detect unlock state and request background activation</li><li>Draw minimal listening UX in the preview pane</li><li>Draw a close X in the top-right and self-dismiss and stop streaming audio when pressed</li><li>Resize or hand-off to the main assistant app view as needed to provide answers</li></ul> |
52
+
53
+
## Best practices for good listening experiences
54
+
55
+
Assistants should build a listening experience to provide critical feedback so the customer can understand the state of the assistant. Below are some possible states to consider when building an assistant experience. These are only possible suggestions, not mandatory guidance.
56
+
57
+
- Assistant is available for speech input
58
+
- Assistant is in the process of activating (either a keyword or mic button press)
59
+
- Assistant is actively streaming audio to the assistant cloud
60
+
- Assistant is ready for the customer to start speaking
61
+
- Assistant is hearing that words are being said
62
+
- Assistant understands that the customer is done speaking
63
+
- Assistant is processing and preparing a response
64
+
- Assistant is responding
65
+
66
+
Even if states change rapidly it is worth considering providing UX for states, since durations are variable across the Windows ecosystem. Visual feedback as well as brief audio chimes or chirps, also called "earcons", can be part of the solution. Likewise, visual cards coupled with audio descriptions make for good response options.
67
+
68
+
## Design guidance for in-app voice activation
69
+
70
+
When the assistant app has focus, the customer intent is clearly to interact with the app, so all voice activation experiences should be handled by the main app view. This view may be resized by the customer. To help explain assistant shell interactions, the rest of this document is intended for assistant developers to understand the detailed requirements using the concrete example of a financial service assistant, named Contoso. In this and subsequent diagrams, what the customer says will appear in cartoon speech bubbles on the left with assistant responses in cartoon bubbles on the right.
71
+
72
+
**In-app view. Initial state when voice activation begins:**
73
+

74
+
75
+
**In-app view. After successful voice activation, listening experience begins:**
76
+
77
+
**In-app view. All responses remain in the app experience.**
78
+
79
+
## Design guidance for voice activation above lock
80
+
81
+
Available with 19H2, assistants built on Windows voice activation platform are available to answer above lock.
82
+
83
+
### Customer opt-in
84
+
85
+
Voice activation above lock is always disabled by default. Customers opt-in through the Windows settings>Privacy>Voice Activation. For details on monitoring and prompting for this setting, see the [above lock implementation guide](windows-voice-assistants-implement-above-lock#Detect-user-preference).
86
+
87
+
### Not a lock-screen replacement
88
+
89
+
While notifications or other standard app lock-screen integration points remain available for the assistant, the Windows lock screen always defines the initial customer experience until a voice activation occurs. After voice activation is detected, the assistant app temporarily appears above the lock screen. To avoid customer confusion, when active above lock, the assistant app must never present UI to ask for any kind of credentials or log-in.
90
+
91
+

92
+
93
+
### Above lock experience following voice activation
94
+
95
+
When the screen is on, the assistant app is full screen with no title bar above the lock screen. Larger visuals and strong voice descriptions with strong VUI allow for cases where the customer is too far away to read UI or has their hands busy with another (non-PC) task.
96
+
97
+
When the screen remains off, the assistant app could play an earcon to indicate the assistant is activating and provide a voice-only experience.
98
+
99
+

100
+
101
+
### Dismissal policies
102
+
103
+
The assistant must implement the dismissal guidance in this section to make it easier for customers to log-in the next time they want to use their Windows PC. Below are specific requirements, which the assistant must implement:
104
+
105
+
-**All assistant canvases that show above lock must contain an X** in the top right that dismisses the assistant.
106
+
-**Pressing any key must also dismiss the assistant app**. Keyboard input is a traditional lock app signal that the customer wants to log-in, so any keyboard/text input should not be directed to the app. Instead, the app should self-dismiss when keyboard input is detected, so the customer can easily log-in to their device.
107
+
-**If the screen goes off, the app must self-dismiss.** This ensures that the next time the customer uses their PC, the log-in screen will be ready and waiting for them.
108
+
- If the app is "in use", it may continue above lock. "in use" constitutes any input or output. For example, when streaming music or video the app may continue above lock. "Follow on" and other multiturn dialog steps are permitted to keep the app above lock.
109
+
-**Implementation details on dismissing the application** can be found [in the above lock implementation guide](windows-voice-assistants-implement-above-lock#Closing-the-application).
110
+
111
+

112
+
113
+

114
+
115
+
### Privacy & security considerations above lock
116
+
117
+
Many PCs are portable but not always within customer reach. They may be briefly left in hotel rooms, airplane seats, or workspaces, where other people have physical access. If assistants that are enabled above lock are not prepared, they can become subject to the class of so-called "[evil maid](https://en.wikipedia.org/wiki/Evil_maid_attack)" attacks.
118
+
119
+
Therefore, assistants should follow the guidance in this section to help keep experience secure. Interaction above lock occurs when the Windows user is unauthenticated so in general **input to the assistant should also be treated as unauthenticated**.
120
+
121
+
- Assistants should **implement a skill whitelist to identify skills that are confirmed secure and safe** to be accessed above lock.
122
+
- Speaker ID technologies can play a role in alleviating some risks, but Speaker ID is not a suitable replacement for Windows authentication.
123
+
- The skill whitelist should consider three classes of actions or skills:
124
+
125
+
|**Action class**|**Description**|**Examples (not a complete list)**|
126
+
| --- | --- | --- |
127
+
| Safe without authentication | General purpose information or basic app command and control |"what time is it?""play the next track"|
128
+
| Safe with Speaker ID | Impersonation risk, revealing personal information. |"what's my next appointment?""review shopping list""answer the call"|
129
+
| Safe only after Windows authentication | High-risk actions which an attacker could use to harm the customer |"buy more groceries""delete my (important) appointment""send a (mean) text message""launch a (nefarious) webpage"|
130
+
131
+
For the case of Contoso, general information around public stock information is safe without authentication. Customer specific information such as number of shares owned is likely safe with Speaker ID. However, buying or selling stocks should never be allowed without Windows authentication.
132
+
133
+
In order to further secure the experience, **weblinks or other app-to-app launches will always be blocked by Windows until the customer signs in.** Finally, Microsoft reserves the right to remove offending assistants from the whitelist of possible assistants as a last resort mitigation if a serious security issues is not addressed in a timely manner.
134
+
135
+
## Design guidance for voice activation preview
136
+
137
+
Below lock, when the assistant app does _not_ have focus, Windows provides a less intrusive voice activation UI to help keep the customer in flow. This is especially true for the case of false activations which would be highly disruptive if they launched the full app. The core idea is that each assistant has another home in the Shell, the assistant taskbar icon. When the request for background activation occurs, a small view above the assistant taskbar icon appears. Assistants should provide a small listening experience in this canvas. After processing the requests, assistants can choose to resize this view to show an in-context answer or to hand off their main app view to show larger, more detailed visuals.
138
+
139
+
- In order to stay minimal, the preview does not have a title bar, so **the assistant must draw an X in the top right to allow customers to dismiss the view.** Refer to [Closing the Application](windows-voice-assistants-implement-above-lock#Closing-the-application) for the specific APIs to call when the dismiss button is pressed.
140
+
- To support voice activation previews, assistants may invite customers to pin the assistant to the taskbar during first run.
141
+
142
+
**Voice activation preview: Initial state**
143
+
144
+
The Contoso assistant has a home on the taskbar: their swirling, circular icon.
145
+
146
+

147
+
148
+
**As activation progresses** , the assistant requests background activation. The assistant is given a small preview pane (default width 408 and height: 248). If server-side voice activation determines no result this view could be dismissed for minimal interruption.
149
+
150
+

151
+
152
+
**When final activation is confirmed** the assistant presents its listening UX. Assistant must always draw a dismiss X in the top right of the voice activation preview.
153
+
154
+

155
+
156
+
**Quick answers** may be shown in the voice activation preview. A TryResizeView will allow assistants to request different sizes.
157
+
158
+

159
+
160
+
**Hand-off**. At any point, the assistant may handoff to its main app view to provide more information or answers or dialogue that require more screen real estate. Please refer [here](windows-voice-assistants-faq#My-app-is-showing-in-a-small-window-when-I-activate-it-by-voice.-How-can-I-transition-from-the-compact-view-to-a-full-application-window) for implementation details.
161
+
162
+

Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/windows-voice-assistants-faq.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,17 @@ ms.author: travisw
22
22
23
23
## Implementation
24
24
25
+
### My app is showing in a small window when I activate it by voice. How can I transition from the compact view to a full application window?
26
+
27
+
When your application is first activated by voice, it is started in a compact view. Please read the [Design guidance for voice activation preview](windows-voice-assistants-best-practices#Design-guidance-for-voice-activation-preview) for guidance on the different views and transitions between them for voice assistants on Windows.
28
+
29
+
To make the transition from compact view to full app view, use the appView API TryEnterViewModeAsync:
30
+
31
+
var appView = ApplicationView.GetForCurrentView();
### Do I have to make my voice assistant a UWP application?
35
+
25
36
### Do I have to use Direct Line Speech for my Windows Conversational Agent?
26
37
27
38
The UWP Sample Application was developed using Direct Line Speech and the Speech Services SDK as a demonstration of how to use a dialog service with the Windows Conversational Agent capability. However, you can use any service for local and cloud keyword verification, speech-to-text conversion, bot dialog, and text-to-speech conversion. See how in the [UWP Sample Application docs]().
0 commit comments