- Real-time speech-to-text transcription using Whisper
- Natural voice responses using Azure OpenAI's voice synthesis
- Continuous conversation capability with voice activity detection
- Low-latency audio processing for smooth interaction
- Support for interruptions and mid-conversation responses
- Python 3.8 or higher
- Azure OpenAI API access
- A microphone for audio input
- Speakers or headphones for audio output
- Clone the repository:
git clone [your-repository-url]
cd impromptu_speech_agent
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install required packages:
pip install -r requirements.txt
- Create a .env file with your Azure OpenAI credentials:
AZURE_OPENAI_ENDPOINT="your-endpoint"
AZURE_OPENAI_KEY="your-key"
AZURE_OPENAI_CHAT_DEPLOYMENT="your-deployment"
Run the speech agent:
python azure_openai_speech.py
run the sk agent sample with:
streamlit run sk_agent_sample.py
Once started, you can:
- Begin speaking naturally - the agent will detect your voice automatically
- Wait for the agent's response
- Interrupt or continue the conversation as needed
- Press Ctrl+C to end the session
The agent can be configured by modifying the following parameters in azure_openai_speech.py
:
sample_rate
: Audio sampling rate (default: 24000)chunk_size
: Audio processing chunk size (default: 2400)- Voice activity detection settings in the
session.update
configuration
Common issues:
- Audio device not found: Ensure your microphone is properly connected and selected as the default input device
- Connection errors: Verify your Azure OpenAI credentials and internet connection
- High latency: Adjust the chunk_size and sample_rate parameters for your system's capabilities
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Azure OpenAI for providing the real-time audio API
- OpenAI's Whisper model for speech recognition
- The Python WebSockets community
- Exploring the Semantic Kernel ChatCompletionAgent
- Exploring the Semantic Kernel OpenAIAssistantAgent
- Exploring the Semantic Kernel AzureAIAgent
- Exploring Agent Collaboration in AgentChat