Skip to content

Commit e8ad253

Browse files
committed
formatting
1 parent 479ce53 commit e8ad253

File tree

1 file changed

+41
-53
lines changed

1 file changed

+41
-53
lines changed

fern/customization/speech-configuration.mdx

Lines changed: 41 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -6,40 +6,6 @@ slug: customization/speech-configuration
66

77
### Introduction
88

9-
Conversation Analysis (CA) examines the structure and organization of human interactions, focusing on how participants manage conversations in real-time. We mimic this natural behavior in our API.
10-
11-
Key concepts include:
12-
13-
<AccordionGroup>
14-
15-
<Accordion title="Turn-Taking Organization">
16-
Conversations are structured into turns, where typically one person speaks at a time. Speakers use Turn Construction Units (TCUs)—such as words, phrases, or clauses—that listeners recognize, allowing them to anticipate when a turn will end and when it's appropriate to speak. Transition Relevance Places (TRPs) are points where a change of speaker can occur. Turn allocation follows specific rules:
17-
18-
- **Current speaker selects next**: The current speaker designates who speaks next.
19-
- **Self-selection**: If not selected, another participant may self-select to speak.
20-
- **Continuation**: If no one else speaks, the current speaker may continue.
21-
22-
Silences are categorized as pauses (within a turn), gaps (between turns), or lapses (when no one speaks).
23-
</Accordion>
24-
<Accordion title="Sequence Organization">
25-
Conversations often involve sequences like adjacency pairs, where an initial utterance (e.g., a question) prompts a related response (e.g., an answer). These pairs can be expanded with pre-sequences (preparing for the main action), insert expansions (occurring between the initial and responsive actions), and post-expansions (following the main action).
26-
</Accordion>
27-
<Accordion title="Preference Organization">
28-
Certain responses are socially preferred. For example, agreements or acceptances are typically delivered promptly and directly, while disagreements or refusals may be delayed or mitigated to maintain social harmony.
29-
</Accordion>
30-
<Accordion title="Repair Mechanisms">
31-
Participants address problems in speaking, hearing, or understanding through repair strategies. Self-repair (the speaker corrects themselves) is generally preferred over other-repair (another person corrects the speaker), helping to maintain conversational flow and mutual understanding.
32-
</Accordion>
33-
<Accordion title="Action Formation">
34-
Speakers perform actions (e.g., questioning, requesting, asserting) through their utterances. Understanding how these actions are constructed and interpreted is central to CA, as it reveals how participants achieve social objectives through conversation.
35-
</Accordion>
36-
<Accordion title="Adjacency Pair">
37-
An adjacency pair is a fundamental unit of conversation consisting of two related utterances. The first part (e.g., a question) typically elicits a specific response (e.g., an answer). These pairs are essential for structuring conversations and ensuring coherence.
38-
</Accordion>
39-
</AccordionGroup>
40-
41-
These foundational structures illustrate how individuals collaboratively produce and interpret talk in interaction, ensuring coherent and meaningful communication.
42-
439
### Speech Configuration in VAPI
4410

4511
Speech configuration is a crucial aspect of designing a voice assistant that delivers a seamless and engaging user experience. By customizing the assistant's speech settings, you can optimize its responsiveness, naturalness, and timing during interactions with users.
@@ -50,6 +16,8 @@ These plans ensure that the assistant does not interrupt the customer and also p
5016

5117
Adjusting these parameters helps tailor the assistant's responsiveness to different conversational dynamics.
5218

19+
For more information on the anatomy of conversation and how it relates to speech recognition, see the [Conversational Analysis](/customization/conversational-analysis) guide.
20+
5321
<CardGroup cols={2}>
5422

5523
<Card
@@ -58,10 +26,9 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ
5826
href='#transcriber-settings'
5927
>
6028
Specify the provider, language, and model for speech transcription.
61-
6229
<Tip
6330
title='API Endpoint'>
64-
[rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
31+
[REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
6532
</Tip>
6633
</Card>
6734

@@ -74,7 +41,7 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ
7441

7542
<Tip
7643
title='API Endpoint'>
77-
[rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
44+
[REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
7845
</Tip>
7946

8047
</Card>
@@ -88,7 +55,7 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ
8855

8956
<Tip
9057
title='API Endpoint'>
91-
[rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
58+
[REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
9259
</Tip>
9360
</Card>
9461

@@ -117,9 +84,28 @@ Adjusting these parameters helps tailor the assistant's responsiveness to differ
11784
<Tip
11885
title='API Endpoint'>
11986

120-
[rest](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
87+
[REST](https://docs.vapi.ai/api-reference/assistants/create#request.body.speech)
88+
12189
</Tip>
12290

91+
</Card>
92+
<Card
93+
title='Best Practices'
94+
icon='solid wand-magic-sparkles'
95+
href='#best-practices'>
96+
97+
Here are some best practices for configuring speech settings to enhance the conversational experience and optimize user engagement.
98+
99+
100+
</Card>
101+
<Card
102+
title='Custom Endpoints'
103+
icon='solid wand-magic-sparkles'
104+
href='#custom-endpoints'>
105+
106+
The custom endpointing rules in Vapi's speech configuration are particularly useful in several scenarios such as non-standard speech environments
107+
108+
123109
</Card>
124110
</CardGroup>
125111

@@ -215,15 +201,14 @@ Use transcription-based endpointing, with specific timeouts after punctuation, n
215201
**Example**: In insurance claims, enabling `smartEndpointingEnabled` helps avoid interruptions while customers think through and formulate responses.
216202

217203

218-
### STOP SPEAKING PLAN
204+
### Stop Speaking Plan
219205

220206
- **Words to Stop Speaking**: Specify the number of words a user must say before the assistant stops talking, preventing interruptions from brief interjections.
221207

222208
- Voice Activity Detection: Set the duration of user speech required to trigger the assistant to stop speaking, minimizing overlaps.
223209

224210
- Pause Before Resuming: Control the delay before the assistant resumes speaking after being interrupted, ensuring a natural conversational flow.
225211

226-
227212
The stopSpeakingPlan allows you to configure how the assistant stops speaking, preventing interruptions and ensuring a smooth conversation. Here's an example:
228213

229214
```json
@@ -250,15 +235,18 @@ This enhanced explanation provides concrete examples and clear descriptions of t
250235
- **Silence Timeout**: Define the duration the assistant waits during user silence before responding or prompting, balancing responsiveness with user comfort.
251236
- **Max Duration**: Set limits on interaction lengths to manage session times effectively. This parameter helps prevent overly long interactions that may lead to user fatigue or disengagement.
252237

253-
## BEST PRACTICES
254-
Best Practices
255-
256-
Adapt to User Style: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses.
257-
258-
Minimize Noise Interference: Adjust parameters to handle noisy environments effectively.
259-
260-
Optimize Conversational Flow: Balance responsiveness and non-intrusiveness by testing different configurations.
261-
262-
Tailor for Use Cases: Customize settings for specific scenarios, such as tech support or healthcare applications.
263-
264-
Iterate and Improve: Continuously test configurations with real users and refine based on feedback.
238+
### Custom Endpoints
239+
- **Complex Conversations**: In situations where users might pause mid-thought or have varying speech patterns, the `BothCustomEndpointingRule` can help create a more natural flow. This is especially valuable in customer service or healthcare applications where conversations can be nuanced and unpredictable.
240+
- **Technical Discussions**: For calls involving technical details or numbers, the `TranscriptionEndpointingPlan`'s `onNumberSeconds` parameter can be adjusted to allow more time after number sequences. This is useful in financial services, tech support, or any scenario where numerical information is frequently exchanged.
241+
- **Multilingual Support**: The `AssistantCustomEndpointingRule` can be tailored to account for different speech patterns and pauses typical in various languages, improving the assistant's responsiveness in multilingual environments.
242+
- **Emotional or Sensitive Conversations**: In counseling or mental health applications, the `CustomerCustomEndpointingRule` can be fine-tuned to allow for longer pauses, giving users more time to process and respond without interruption.
243+
- **High-Noise Environments**: For calls from locations with significant background noise, like factories or busy streets, these rules can be adjusted to better distinguish between speech and ambient sounds, improving the overall conversation quality.
244+
- **Elderly or Speech-Impaired Users**: The endpointing rules can be customized to accommodate slower speech patterns or frequent pauses, ensuring the assistant doesn't interrupt prematurely.
245+
246+
247+
### Best Practices
248+
- **Adapt to User Style**: Configure settings based on conversational dynamics, such as enabling smart endpointing for mid-thought pauses.
249+
- **Minimize Noise Interference**: Adjust parameters to handle noisy environments effectively.
250+
- **Optimize Conversational Flow**: Balance responsiveness and non-intrusiveness by testing different configurations.
251+
- **Tailor for Use Cases**: Customize settings for specific scenarios, such as tech support or healthcare applications.
252+
- **Iterate and Improve**: Continuously test configurations with real users and refine based on feedback.

0 commit comments

Comments
 (0)