diff --git a/fern/apis/api/openapi-overrides.yml b/fern/apis/api/openapi-overrides.yml index a57421a9e..598d91ed1 100644 --- a/fern/apis/api/openapi-overrides.yml +++ b/fern/apis/api/openapi-overrides.yml @@ -1083,8 +1083,6 @@ components: title: TavusVoice VapiVoice: title: VapiVoice - SesameVoice: - title: SesameVoice AIEdgeCondition: title: AIEdgeCondition LogicEdgeCondition: diff --git a/fern/docs.yml b/fern/docs.yml index 51d510262..17df435d0 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -470,8 +470,8 @@ navigation: path: providers/voice/rimeai.mdx - page: Deepgram path: providers/voice/deepgram.mdx - - page: Sesame - path: providers/voice/sesame.mdx + - page: Inworld + path: providers/voice/inworld.mdx - section: Video models contents: - page: Tavus diff --git a/fern/providers/voice/inworld.mdx b/fern/providers/voice/inworld.mdx new file mode 100644 index 000000000..f03d50d36 --- /dev/null +++ b/fern/providers/voice/inworld.mdx @@ -0,0 +1,61 @@ +--- +title: InworldAI +subtitle: What is Inworld.ai? +slug: providers/voice/inworld +--- + +**What is Inworld.ai?** + +Inworld.ai provides developers with tools to create lifelike voice agents. It supports zero-shot voice cloning, enabling the creation of personalized voices from short audio samples. The system is optimized for low-latency streaming, making it suitable for applications requiring immediate audio responses. + +**The Evolution of AI Speech Synthesis:** + +Advancements in deep learning and neural networks have significantly improved the quality of AI-generated speech. Inworld.ai leverages these developments to deliver natural-sounding, emotionally expressive voices suitable for various applications, including virtual assistants and interactive games. + +**Overview of Inworld.ai's Offerings:** + +Inworld.ai provides a comprehensive suite of features designed to meet diverse voice synthesis needs: + +**Real-Time Speech Synthesis:** + +Inworld.ai is engineered for low-latency performance, delivering the first two seconds of audio in approximately 200 milliseconds. This responsiveness is critical for real-time applications such as conversational agents and interactive gaming characters. + +**Zero-Shot Voice Cloning:** + +The platform offers zero-shot voice cloning, allowing developers to create custom voices from as little as 5 seconds of audio input. This feature facilitates the development of unique voice identities for various applications. + +**Multilingual Support:** + +Inworld.ai supports 11 languages, including English, Spanish, French, Korean, and Chinese. This multilingual capability enables developers to build applications for diverse global audiences. + +**Audio Markup Controls:** + +Developers can use audio markup tags such as [happy], [whispering], or [sigh] to control the emotional tone and style of the synthesized speech. This feature enhances the expressiveness of voice agents. + +**Developer API:** + +Inworld.ai provides an API with comprehensive documentation, facilitating integration into various applications. The API supports real-time streaming and offers options for customizing voice parameters to suit specific use cases. + +**Use Cases for Inworld.ai:** + +Inworld.ai's versatile platform supports a wide range of applications: + +**Interactive Applications:** + +Developers can create responsive voice agents for customer service, virtual assistants, and interactive gaming characters, enhancing user engagement through natural-sounding speech. + +**Content Creation:** + +Content creators can utilize Inworld.ai to generate high-quality voiceovers for videos, podcasts, and other media, streamlining the production process. + +**Education and Training:** + +Educational platforms can employ Inworld.ai to provide clear and expressive narration for e-learning materials, improving the learning experience for users. + +**Integration with Vapi:** + +Inworld.ai is integrated with Vapi, allowing developers to access its features through the Vapi platform. This integration simplifies the process of building and deploying voice agents, offering tools for testing and optimizing performance before production. + +**Conclusion:** + +Inworld.ai offers a combination of expressive voice synthesis, low-latency performance, and multilingual support, making it a valuable tool for developers seeking to enhance their applications with natural-sounding speech. \ No newline at end of file diff --git a/fern/providers/voice/sesame.mdx b/fern/providers/voice/sesame.mdx deleted file mode 100644 index 28349c4e9..000000000 --- a/fern/providers/voice/sesame.mdx +++ /dev/null @@ -1,41 +0,0 @@ ---- -title: Sesame -subtitle: What is Sesame CSM-1B? -slug: providers/voice/sesame ---- - -**What is Sesame CSM-1B?** - -Sesame CSM-1B is an open source text-to-speech (TTS) model that Vapi hosts for seamless integration into your voice applications. This model delivers natural-sounding speech synthesis with a default voice option and voice cloning capabilities. - -**Key Features:** - -- **Vapi-Hosted Solution**: Access this open source model directly through Vapi without managing your own infrastructure -- **Voice Options**: Offers a default voice and voice cloning capabilities - -**Integration Benefits:** - -- Simplified setup with no need to self-host the model -- Consistent performance through Vapi's optimized infrastructure -- Seamless compatibility with all Vapi voice applications - -**Use Cases:** - -- Virtual assistants and conversational AI -- Content narration and audio generation -- Interactive voice applications -- Prototyping voice-driven experiences - -**Voice Cloning:** - -![Sesame Voice Cloning](/static/images/voice-tab/sesame/cloning.png) - -Sesame supports voice cloning. To clone a voice: -1. Navigate to the additional configuration tab (below the voice tab) on the assistants page -2. Upload a WAV file containing your voice sample -3. Provide the transcript of the audio file -4. Name your custom voice - -**Current Limitations:** - -The model currently has some limitations. Additional features may be introduced in future updates. \ No newline at end of file