AI Voice Assistant Agents

This project contains two AI-powered voice assistant agents (agent.py and agent2.py) built using LiveKit, enabling real-time voice interactions with users. The agents integrate multiple AI services such as Google Gemini, ElevenLabs TTS, Deepgram STT, and support various utility tools like web search, email, and weather updates.

About LiveKit

LiveKit is an open-source platform for real-time audio and video applications. It uses WebRTC under the hood to provide low-latency, peer-to-peer communication, making it ideal for AI voice agents. LiveKit allows:

Multi-participant audio/video sessions
Noise cancellation and audio enhancements
Integration with AI pipelines for real-time speech processing

How it works:

WebRTC handles the real-time transport of audio/video streams.
The agent receives voice input from the user (STT).
The LLM processes the input and generates a response.
The response is converted back to audio (TTS) and played in the session.
Optional tools (weather, search, email) are invoked as needed.

This orchestration forms the voice agent pipeline:
User Voice → STT → LLM → Tool Execution → TTS → Output Voice

Quick Start

1. Environment Setup

uv venv
.venv\Scripts\activate
uv sync

2. Running `agent.py`

Command	Description
`uv run agent.py download-files`	Download required project files
`uv run agent.py console`	Run agent in console mode
`uv run agent.py dev`	Run agent in LiveKit playground

3. Running `agent2.py`

Command	Description
`uv run agent2.py download-files`	Download required project files
`uv run agent2.py console`	Run agent in console mode
`uv run agent2.py dev`	Run agent in LiveKit playground

Features

Real-time voice interaction with noise cancellation
Text-to-Speech (TTS) and Speech-to-Text (STT)
Turn detection for multi-user sessions (agent2.py)
Tool integration: Weather updates, web searches, and email sending
Video support for LiveKit rooms
Multi-language STT and telephony support (agent2.py)

Voice Agent Pipelines

`agent.py`

Google Realtime LLM
Voice output using Aoede voice
Noise cancellation and video-enabled sessions
Simple utility tools integration

`agent2.py`

Deepgram STT (multi-language)
Google Gemini LLM
ElevenLabs TTS
Silero VAD and Multilingual turn detection
Supports telephony and multilingual sessions
Same utility tools as agent.py

Voice Agent Pipeline Diagram

Notes

Both agents use environment variables defined in .env.
agent2.py is more advanced and suitable for multilingual or telephony scenarios.
LiveKit playground allows testing the agents in real-time with audio/video streaming.

License

Open-source. Can be extended to add custom AI tools and functionalities.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/voice_agent		src/voice_agent
.gitignore		.gitignore
.python-version		.python-version
Lesson4.ipynb		Lesson4.ipynb
Lesson5.ipynb		Lesson5.ipynb
README.md		README.md
agent.py		agent.py
agent2.py		agent2.py
prompts.py		prompts.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tools.py		tools.py
uv.lock		uv.lock
voice_agent_pipeline.png		voice_agent_pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Voice Assistant Agents

About LiveKit

Quick Start

1. Environment Setup

2. Running `agent.py`

3. Running `agent2.py`

Features

Voice Agent Pipelines

`agent.py`

`agent2.py`

Voice Agent Pipeline Diagram

Notes

License

About

Uh oh!

Releases

Packages

Languages

MuhammadAbdullah95/AI_voice_agent_LiveKit

Folders and files

Latest commit

History

Repository files navigation

AI Voice Assistant Agents

About LiveKit

Quick Start

1. Environment Setup

2. Running agent.py

3. Running agent2.py

Features

Voice Agent Pipelines

agent.py

agent2.py

Voice Agent Pipeline Diagram

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Running `agent.py`

3. Running `agent2.py`

`agent.py`

`agent2.py`

Packages