Voice AI Agent Example

A multilingual voice AI conversation partner built with FastAPI and Pipecat that helps English speakers learn other languages through real-time voice conversations.

Features

Real-time voice conversations via WebSocket
Speech-to-Text using Deepgram
AI-powered responses with OpenAI GPT-4
Text-to-Speech using ElevenLabs
Twilio integration for phone calls
Multilingual support
Dynamic WebSocket URL generation for different deployment environments

Architecture

FastAPI: Web framework with WebSocket support
Pipecat: Audio processing pipeline
Twilio: Phone call integration
Deepgram: Speech recognition
OpenAI: Language model for conversations
ElevenLabs: Text-to-speech synthesis

Prerequisites

Python 3.12+
API keys for:
- OpenAI
- Deepgram
- ElevenLabs
- Twilio Account SID and Auth Token

Local Development

Install dependencies:
```
pip install -r requirements.txt
```

Set environment variables:

cp .env.example .env
# Edit .env with your API keys

Run the application:

uvicorn main:app --host 0.0.0.0 --port 8000

Deployment

Cerebrium Deployment

Install Cerebrium CLI:
```
pip install cerebrium
```
Configure secrets in Cerebrium dashboard:
- OPENAI_API_KEY
- DEEPGRAM_API_KEY
- ELEVENLABS_API_KEY
- TWILIO_ACCOUNT_SID
- TWILIO_AUTH_TOKEN
Deploy:
```
cerebrium deploy
```

Amazon ECS Deployment

Configure AWS credentials:
```
aws configure
```

Deploy infrastructure:

cd infrastructure
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your API keys
./deploy-infra.sh

Deploy application:
```
cd ..
./deploy.sh
```
Application URL: The application URL will be displayed at the end of the deployment process. Use the ALB DNS name for accessing the application.

API Endpoints

POST / - Returns TwiML with WebSocket URL for Twilio
WebSocket /ws - Real-time audio streaming endpoint

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key for GPT-4
`DEEPGRAM_API_KEY`	Deepgram API key for speech recognition
`ELEVENLABS_API_KEY`	ElevenLabs API key for text-to-speech
`TWILIO_ACCOUNT_SID`	Twilio Account SID
`TWILIO_AUTH_TOKEN`	Twilio Auth Token

How It Works

External service (Twilio) makes POST request to /
Application returns TwiML with dynamically generated WebSocket URL
WebSocket connection established at /ws
Audio pipeline processes:
- Incoming audio → Speech-to-Text
- Text → AI processing
- AI response → Text-to-Speech
- Audio response sent back

Cleanup

Cerebrium

cerebrium delete cerebrium-demo

Amazon ECS

cd infrastructure
tofu destroy

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
infrastructure		infrastructure
.gitignore		.gitignore
.tool-versions		.tool-versions
Dockerfile		Dockerfile
README.md		README.md
bot.py		bot.py
cerebrium.toml		cerebrium.toml
deploy.sh		deploy.sh
ecs-task-definition.json		ecs-task-definition.json
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice AI Agent Example

Features

Architecture

Prerequisites

Local Development

Deployment

Cerebrium Deployment

Amazon ECS Deployment

API Endpoints

Environment Variables

How It Works

Cleanup

Cerebrium

Amazon ECS

About

Uh oh!

Releases

Packages

Languages

WebRTCventures/voice-ai-agent-example

Folders and files

Latest commit

History

Repository files navigation

Voice AI Agent Example

Features

Architecture

Prerequisites

Local Development

Deployment

Cerebrium Deployment

Amazon ECS Deployment

API Endpoints

Environment Variables

How It Works

Cleanup

Cerebrium

Amazon ECS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages