This example demonstrates a voice chat application using Semantic Kernel and OpenAI's API for speech-to-text, chat, and text-to-speech functionalities. The application captures audio from the microphone, processes it through a pipeline, and plays back the AI-generated responses with the following flow:
Microphone → VAD → STT → Chat → TTS → Speaker
Use .NET user-secrets to securely store your API key:
dotnet user-secrets set "OpenAI:ApiKey" "your-openai-api-key"
The sample can be further extended by improving VAD, STT and other components. Some suggestions include:
- Use local CPU ML model based Voice Activity Detector, such as Silero VAD
- Use audio streaming, such as supported by Azure AI Speech, Deepgram and other providers.
- Connect Semantic Kernel plugins or tools for richer, task-oriented conversations.