This project contains two AI-powered voice assistant agents (agent.py
and agent2.py
) built using LiveKit, enabling real-time voice interactions with users. The agents integrate multiple AI services such as Google Gemini, ElevenLabs TTS, Deepgram STT, and support various utility tools like web search, email, and weather updates.
LiveKit is an open-source platform for real-time audio and video applications. It uses WebRTC under the hood to provide low-latency, peer-to-peer communication, making it ideal for AI voice agents. LiveKit allows:
- Multi-participant audio/video sessions
- Noise cancellation and audio enhancements
- Integration with AI pipelines for real-time speech processing
How it works:
- WebRTC handles the real-time transport of audio/video streams.
- The agent receives voice input from the user (STT).
- The LLM processes the input and generates a response.
- The response is converted back to audio (TTS) and played in the session.
- Optional tools (weather, search, email) are invoked as needed.
This orchestration forms the voice agent pipeline:
User Voice → STT → LLM → Tool Execution → TTS → Output Voice
uv venv
.venv\Scripts\activate
uv sync
Command | Description |
---|---|
uv run agent.py download-files |
Download required project files |
uv run agent.py console |
Run agent in console mode |
uv run agent.py dev |
Run agent in LiveKit playground |
Command | Description |
---|---|
uv run agent2.py download-files |
Download required project files |
uv run agent2.py console |
Run agent in console mode |
uv run agent2.py dev |
Run agent in LiveKit playground |
- Real-time voice interaction with noise cancellation
- Text-to-Speech (TTS) and Speech-to-Text (STT)
- Turn detection for multi-user sessions (agent2.py)
- Tool integration: Weather updates, web searches, and email sending
- Video support for LiveKit rooms
- Multi-language STT and telephony support (agent2.py)
- Google Realtime LLM
- Voice output using Aoede voice
- Noise cancellation and video-enabled sessions
- Simple utility tools integration
- Deepgram STT (multi-language)
- Google Gemini LLM
- ElevenLabs TTS
- Silero VAD and Multilingual turn detection
- Supports telephony and multilingual sessions
- Same utility tools as agent.py
- Both agents use environment variables defined in
.env
. - agent2.py is more advanced and suitable for multilingual or telephony scenarios.
- LiveKit playground allows testing the agents in real-time with audio/video streaming.
Open-source. Can be extended to add custom AI tools and functionalities.