You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Conversation Analysis (CA) examines the structure and organization of human interactions, focusing on how participants manage conversations in real-time. We mimic this natural behavior in our API.
10
-
11
-
Key concepts include:
12
-
13
-
<AccordionGroup>
14
-
15
-
<Accordiontitle="Turn-Taking Organization">
16
-
Conversations are structured into turns, where typically one person speaks at a time. Speakers use Turn Construction Units (TCUs)—such as words, phrases, or clauses—that listeners recognize, allowing them to anticipate when a turn will end and when it's appropriate to speak. Transition Relevance Places (TRPs) are points where a change of speaker can occur. Turn allocation follows specific rules:
17
-
18
-
-**Current speaker selects next**: The current speaker designates who speaks next.
19
-
-**Self-selection**: If not selected, another participant may self-select to speak.
20
-
-**Continuation**: If no one else speaks, the current speaker may continue.
21
-
22
-
Silences are categorized as pauses (within a turn), gaps (between turns), or lapses (when no one speaks).
23
-
</Accordion>
24
-
<Accordiontitle="Sequence Organization">
25
-
Conversations often involve sequences like adjacency pairs, where an initial utterance (e.g., a question) prompts a related response (e.g., an answer). These pairs can be expanded with pre-sequences (preparing for the main action), insert expansions (occurring between the initial and responsive actions), and post-expansions (following the main action).
26
-
</Accordion>
27
-
<Accordiontitle="Preference Organization">
28
-
Certain responses are socially preferred. For example, agreements or acceptances are typically delivered promptly and directly, while disagreements or refusals may be delayed or mitigated to maintain social harmony.
29
-
</Accordion>
30
-
<Accordiontitle="Repair Mechanisms">
31
-
Participants address problems in speaking, hearing, or understanding through repair strategies. Self-repair (the speaker corrects themselves) is generally preferred over other-repair (another person corrects the speaker), helping to maintain conversational flow and mutual understanding.
32
-
</Accordion>
33
-
<Accordiontitle="Action Formation">
34
-
Speakers perform actions (e.g., questioning, requesting, asserting) through their utterances. Understanding how these actions are constructed and interpreted is central to CA, as it reveals how participants achieve social objectives through conversation.
35
-
</Accordion>
36
-
<Accordiontitle="Adjacency Pair">
37
-
An adjacency pair is a fundamental unit of conversation consisting of two related utterances. The first part (e.g., a question) typically elicits a specific response (e.g., an answer). These pairs are essential for structuring conversations and ensuring coherence.
38
-
</Accordion>
39
-
</AccordionGroup>
40
-
41
-
These foundational structures illustrate how individuals collaboratively produce and interpret talk in interaction, ensuring coherent and meaningful communication.
42
-
43
9
### Speech Configuration in VAPI
44
10
45
11
Speech configuration is a crucial aspect of designing a voice assistant that delivers a seamless and engaging user experience. By customizing the assistant's speech settings, you can optimize its responsiveness, naturalness, and timing during interactions with users.
@@ -50,6 +16,8 @@ These plans ensure that the assistant does not interrupt the customer and also p
50
16
51
17
Adjusting these parameters helps tailor the assistant's responsiveness to different conversational dynamics.
52
18
19
+
For more information on the anatomy of conversation and how it relates to speech recognition, see the [Conversational Analysis](/customization/conversational-analysis) guide.
20
+
53
21
<CardGroupcols={2}>
54
22
55
23
<Card
@@ -58,10 +26,9 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ
58
26
href='#transcriber-settings'
59
27
>
60
28
Specify the provider, language, and model for speech transcription.
Here are some best practices for configuring speech settings to enhance the conversational experience and optimize user engagement.
98
+
99
+
100
+
</Card>
101
+
<Card
102
+
title='Custom Endpoints'
103
+
icon='solid wand-magic-sparkles'
104
+
href='#custom-endpoints'>
105
+
106
+
The custom endpointing rules in Vapi's speech configuration are particularly useful in several scenarios such as non-standard speech environments
107
+
108
+
123
109
</Card>
124
110
</CardGroup>
125
111
@@ -215,15 +201,14 @@ Use transcription-based endpointing, with specific timeouts after punctuation, n
215
201
**Example**: In insurance claims, enabling `smartEndpointingEnabled` helps avoid interruptions while customers think through and formulate responses.
216
202
217
203
218
-
### STOP SPEAKING PLAN
204
+
### Stop Speaking Plan
219
205
220
206
-**Words to Stop Speaking**: Specify the number of words a user must say before the assistant stops talking, preventing interruptions from brief interjections.
221
207
222
208
- Voice Activity Detection: Set the duration of user speech required to trigger the assistant to stop speaking, minimizing overlaps.
223
209
224
210
- Pause Before Resuming: Control the delay before the assistant resumes speaking after being interrupted, ensuring a natural conversational flow.
225
211
226
-
227
212
The stopSpeakingPlan allows you to configure how the assistant stops speaking, preventing interruptions and ensuring a smooth conversation. Here's an example:
228
213
229
214
```json
@@ -250,15 +235,18 @@ This enhanced explanation provides concrete examples and clear descriptions of t
250
235
-**Silence Timeout**: Define the duration the assistant waits during user silence before responding or prompting, balancing responsiveness with user comfort.
251
236
-**Max Duration**: Set limits on interaction lengths to manage session times effectively. This parameter helps prevent overly long interactions that may lead to user fatigue or disengagement.
252
237
253
-
## BEST PRACTICES
254
-
Best Practices
255
-
256
-
Adapt to User Style: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses.
257
-
258
-
Minimize Noise Interference: Adjust parameters to handle noisy environments effectively.
259
-
260
-
Optimize Conversational Flow: Balance responsiveness and non-intrusiveness by testing different configurations.
261
-
262
-
Tailor for Use Cases: Customize settings for specific scenarios, such as tech support or healthcare applications.
263
-
264
-
Iterate and Improve: Continuously test configurations with real users and refine based on feedback.
238
+
### Custom Endpoints
239
+
-**Complex Conversations**: In situations where users might pause mid-thought or have varying speech patterns, the `BothCustomEndpointingRule` can help create a more natural flow. This is especially valuable in customer service or healthcare applications where conversations can be nuanced and unpredictable.
240
+
-**Technical Discussions**: For calls involving technical details or numbers, the `TranscriptionEndpointingPlan`'s `onNumberSeconds` parameter can be adjusted to allow more time after number sequences. This is useful in financial services, tech support, or any scenario where numerical information is frequently exchanged.
241
+
-**Multilingual Support**: The `AssistantCustomEndpointingRule` can be tailored to account for different speech patterns and pauses typical in various languages, improving the assistant's responsiveness in multilingual environments.
242
+
-**Emotional or Sensitive Conversations**: In counseling or mental health applications, the `CustomerCustomEndpointingRule` can be fine-tuned to allow for longer pauses, giving users more time to process and respond without interruption.
243
+
-**High-Noise Environments**: For calls from locations with significant background noise, like factories or busy streets, these rules can be adjusted to better distinguish between speech and ambient sounds, improving the overall conversation quality.
244
+
-**Elderly or Speech-Impaired Users**: The endpointing rules can be customized to accommodate slower speech patterns or frequent pauses, ensuring the assistant doesn't interrupt prematurely.
245
+
246
+
247
+
### Best Practices
248
+
-**Adapt to User Style**: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses.
249
+
-**Minimize Noise Interference**: Adjust parameters to handle noisy environments effectively.
250
+
-**Optimize Conversational Flow**: Balance responsiveness and non-intrusiveness by testing different configurations.
251
+
-**Tailor for Use Cases**: Customize settings for specific scenarios, such as tech support or healthcare applications.
252
+
-**Iterate and Improve**: Continuously test configurations with real users and refine based on feedback.
0 commit comments