Impromptu Speech Agent

Features

Real-time speech-to-text transcription using Whisper
Natural voice responses using Azure OpenAI's voice synthesis
Continuous conversation capability with voice activity detection
Low-latency audio processing for smooth interaction
Support for interruptions and mid-conversation responses

Prerequisites

Python 3.8 or higher
Azure OpenAI API access
A microphone for audio input
Speakers or headphones for audio output

Installation

Clone the repository:

git clone [your-repository-url]
cd impromptu_speech_agent

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages:

pip install -r requirements.txt

Create a .env file with your Azure OpenAI credentials:

AZURE_OPENAI_ENDPOINT="your-endpoint"
AZURE_OPENAI_KEY="your-key"
AZURE_OPENAI_CHAT_DEPLOYMENT="your-deployment"

Usage

Run the speech agent:

python azure_openai_speech.py

run the sk agent sample with:

streamlit run sk_agent_sample.py

Once started, you can:

Begin speaking naturally - the agent will detect your voice automatically
Wait for the agent's response
Interrupt or continue the conversation as needed
Press Ctrl+C to end the session

Configuration

The agent can be configured by modifying the following parameters in azure_openai_speech.py:

sample_rate: Audio sampling rate (default: 24000)
chunk_size: Audio processing chunk size (default: 2400)
Voice activity detection settings in the session.update configuration

Troubleshooting

Common issues:

Audio device not found: Ensure your microphone is properly connected and selected as the default input device
Connection errors: Verify your Azure OpenAI credentials and internet connection
High latency: Adjust the chunk_size and sample_rate parameters for your system's capabilities

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Impromptu Speech Agent

Features

Prerequisites

Installation

Usage

Configuration

Troubleshooting

Contributing

License

Acknowledgments

Resources and References

Azure AI Agents

Semantic Kernel

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

License

Lin-ux-404/impromptu_speech_agent

Folders and files

Latest commit

History

Repository files navigation

Impromptu Speech Agent

Features

Prerequisites

Installation

Usage

Configuration

Troubleshooting

Contributing

License

Acknowledgments

Resources and References

Azure AI Agents

Semantic Kernel

Additional Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages