Skip to content

Commit 7b86bce

Browse files
authored
VAP-5093 livekit smart endpointing docs (#275)
* docs for smartEndpointingPlan.waitFunction * sesame docs
1 parent 727c12f commit 7b86bce

File tree

3 files changed

+44
-1
lines changed

3 files changed

+44
-1
lines changed

fern/customization/speech-configuration.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,16 @@ This plan defines the parameters for when the assistant begins speaking after th
2323

2424
![LiveKit Smart Endpointing Configuration](../static/images/advanced-tab/livekit-smart-endpointing.png)
2525

26-
**Example:** In insurance claims, Vapi's smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak.
26+
**LiveKit Smart Endpointing Configuration:**
27+
When using LiveKit, you can customize the `waitFunction` parameter which determines how long the bot will wait to start speaking based on the likelihood that the user has finished speaking:
28+
29+
```
30+
waitFunction: "200 + 8000 * x"
31+
```
32+
33+
This function maps probabilities (0-1) to milliseconds of wait time. A probability of 0 means high confidence the caller has stopped speaking, while 1 means high confidence they're still speaking. The default function (`200 + 8000 * x`) creates a wait time between 200ms (when x=0) and 8200ms (when x=1). You can customize this with your own mathematical expression, such as `4000 * (1 - cos(pi * x))` for a different response curve.
34+
35+
**Example:** In insurance claims, smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak.
2736

2837
- **Transcription-Based Detection**: Customize how the assistant determines that the customer has stopped speaking based on what they’re saying. This offers more control over the timing. **Example:** When a customer says, "My account number is 123456789, I want to transfer $500."
2938
- The system detects the number "123456789" and waits for 0.5 seconds (`WaitSeconds`) to ensure the customer isn't still speaking.

fern/docs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,8 @@ navigation:
276276
path: providers/voice/rimeai.mdx
277277
- page: Deepgram
278278
path: providers/voice/deepgram.mdx
279+
- page: Sesame
280+
path: providers/voice/sesame.mdx
279281

280282
- section: Video Models
281283
contents:

fern/providers/voice/sesame.mdx

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: Sesame
3+
subtitle: What is Sesame CSM-1B?
4+
slug: providers/voice/sesame
5+
---
6+
7+
**What is Sesame CSM-1B?**
8+
9+
Sesame CSM-1B is an open source text-to-speech (TTS) model that Vapi hosts for seamless integration into your voice applications. Currently in beta, this model delivers natural-sounding speech synthesis with a single default voice option.
10+
11+
**Key Features:**
12+
13+
- **Vapi-Hosted Solution**: Access this open source model directly through Vapi without managing your own infrastructure
14+
- **Single Default Voice**: Currently offers one voice option optimized for clarity and naturalness
15+
- **Beta Release**: Early access to this emerging TTS technology
16+
17+
**Integration Benefits:**
18+
19+
- Simplified setup with no need to self-host the model
20+
- Consistent performance through Vapi's optimized infrastructure
21+
- Seamless compatibility with all Vapi voice applications
22+
23+
**Use Cases:**
24+
25+
- Virtual assistants and conversational AI
26+
- Content narration and audio generation
27+
- Interactive voice applications
28+
- Prototyping voice-driven experiences
29+
30+
**Current Limitations:**
31+
32+
As this is a beta release, the model currently offers limited customization options with only one default voice available. Additional features and voice options may be introduced in future updates.

0 commit comments

Comments
 (0)