Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion fern/customization/speech-configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,16 @@ This plan defines the parameters for when the assistant begins speaking after th

![LiveKit Smart Endpointing Configuration](../static/images/advanced-tab/livekit-smart-endpointing.png)

**Example:** In insurance claims, Vapi's smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak.
**LiveKit Smart Endpointing Configuration:**
When using LiveKit, you can customize the `waitFunction` parameter which determines how long the bot will wait to start speaking based on the likelihood that the user has finished speaking:

```
waitFunction: "200 + 8000 * x"
```

This function maps probabilities (0-1) to milliseconds of wait time. A probability of 0 means high confidence the caller has stopped speaking, while 1 means high confidence they're still speaking. The default function (`200 + 8000 * x`) creates a wait time between 200ms (when x=0) and 8200ms (when x=1). You can customize this with your own mathematical expression, such as `4000 * (1 - cos(pi * x))` for a different response curve.

**Example:** In insurance claims, smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak.

- **Transcription-Based Detection**: Customize how the assistant determines that the customer has stopped speaking based on what they’re saying. This offers more control over the timing. **Example:** When a customer says, "My account number is 123456789, I want to transfer $500."
- The system detects the number "123456789" and waits for 0.5 seconds (`WaitSeconds`) to ensure the customer isn't still speaking.
Expand Down
2 changes: 2 additions & 0 deletions fern/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,8 @@ navigation:
path: providers/voice/rimeai.mdx
- page: Deepgram
path: providers/voice/deepgram.mdx
- page: Sesame
path: providers/voice/sesame.mdx

- section: Video Models
contents:
Expand Down
32 changes: 32 additions & 0 deletions fern/providers/voice/sesame.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: Sesame
subtitle: What is Sesame CSM-1B?
slug: providers/voice/sesame
---

**What is Sesame CSM-1B?**

Sesame CSM-1B is an open source text-to-speech (TTS) model that Vapi hosts for seamless integration into your voice applications. Currently in beta, this model delivers natural-sounding speech synthesis with a single default voice option.

**Key Features:**

- **Vapi-Hosted Solution**: Access this open source model directly through Vapi without managing your own infrastructure
- **Single Default Voice**: Currently offers one voice option optimized for clarity and naturalness
- **Beta Release**: Early access to this emerging TTS technology

**Integration Benefits:**

- Simplified setup with no need to self-host the model
- Consistent performance through Vapi's optimized infrastructure
- Seamless compatibility with all Vapi voice applications

**Use Cases:**

- Virtual assistants and conversational AI
- Content narration and audio generation
- Interactive voice applications
- Prototyping voice-driven experiences

**Current Limitations:**

As this is a beta release, the model currently offers limited customization options with only one default voice available. Additional features and voice options may be introduced in future updates.
Loading