Skip to content

MuhammadAbdullah95/AI_voice_agent_LiveKit

Repository files navigation

AI Voice Assistant Agents

This project contains two AI-powered voice assistant agents (agent.py and agent2.py) built using LiveKit, enabling real-time voice interactions with users. The agents integrate multiple AI services such as Google Gemini, ElevenLabs TTS, Deepgram STT, and support various utility tools like web search, email, and weather updates.


About LiveKit

LiveKit is an open-source platform for real-time audio and video applications. It uses WebRTC under the hood to provide low-latency, peer-to-peer communication, making it ideal for AI voice agents. LiveKit allows:

  • Multi-participant audio/video sessions
  • Noise cancellation and audio enhancements
  • Integration with AI pipelines for real-time speech processing

How it works:

  1. WebRTC handles the real-time transport of audio/video streams.
  2. The agent receives voice input from the user (STT).
  3. The LLM processes the input and generates a response.
  4. The response is converted back to audio (TTS) and played in the session.
  5. Optional tools (weather, search, email) are invoked as needed.

This orchestration forms the voice agent pipeline:
User Voice → STT → LLM → Tool Execution → TTS → Output Voice


Quick Start

1. Environment Setup

uv venv
.venv\Scripts\activate
uv sync

2. Running agent.py

Command Description
uv run agent.py download-files Download required project files
uv run agent.py console Run agent in console mode
uv run agent.py dev Run agent in LiveKit playground

3. Running agent2.py

Command Description
uv run agent2.py download-files Download required project files
uv run agent2.py console Run agent in console mode
uv run agent2.py dev Run agent in LiveKit playground

Features

  • Real-time voice interaction with noise cancellation
  • Text-to-Speech (TTS) and Speech-to-Text (STT)
  • Turn detection for multi-user sessions (agent2.py)
  • Tool integration: Weather updates, web searches, and email sending
  • Video support for LiveKit rooms
  • Multi-language STT and telephony support (agent2.py)

Voice Agent Pipelines

agent.py

  • Google Realtime LLM
  • Voice output using Aoede voice
  • Noise cancellation and video-enabled sessions
  • Simple utility tools integration

agent2.py

  • Deepgram STT (multi-language)
  • Google Gemini LLM
  • ElevenLabs TTS
  • Silero VAD and Multilingual turn detection
  • Supports telephony and multilingual sessions
  • Same utility tools as agent.py

Voice Agent Pipeline Diagram

Voice Agent Pipeline


Notes

  • Both agents use environment variables defined in .env.
  • agent2.py is more advanced and suitable for multilingual or telephony scenarios.
  • LiveKit playground allows testing the agents in real-time with audio/video streaming.

License

Open-source. Can be extended to add custom AI tools and functionalities.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published