A multilingual voice AI conversation partner built with FastAPI and Pipecat that helps English speakers learn other languages through real-time voice conversations.
- Real-time voice conversations via WebSocket
- Speech-to-Text using Deepgram
- AI-powered responses with OpenAI GPT-4
- Text-to-Speech using ElevenLabs
- Twilio integration for phone calls
- Multilingual support
- Dynamic WebSocket URL generation for different deployment environments
- FastAPI: Web framework with WebSocket support
- Pipecat: Audio processing pipeline
- Twilio: Phone call integration
- Deepgram: Speech recognition
- OpenAI: Language model for conversations
- ElevenLabs: Text-to-speech synthesis
- Python 3.12+
- API keys for:
- OpenAI
- Deepgram
- ElevenLabs
- Twilio Account SID and Auth Token
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables:
cp .env.example .env # Edit .env with your API keys
-
Run the application:
uvicorn main:app --host 0.0.0.0 --port 8000
-
Install Cerebrium CLI:
pip install cerebrium
-
Configure secrets in Cerebrium dashboard:
OPENAI_API_KEY
DEEPGRAM_API_KEY
ELEVENLABS_API_KEY
TWILIO_ACCOUNT_SID
TWILIO_AUTH_TOKEN
-
Deploy:
cerebrium deploy
-
Configure AWS credentials:
aws configure
-
Deploy infrastructure:
cd infrastructure cp terraform.tfvars.example terraform.tfvars # Edit terraform.tfvars with your API keys ./deploy-infra.sh
-
Deploy application:
cd .. ./deploy.sh
-
Application URL: The application URL will be displayed at the end of the deployment process. Use the ALB DNS name for accessing the application.
POST /
- Returns TwiML with WebSocket URL for TwilioWebSocket /ws
- Real-time audio streaming endpoint
Variable | Description |
---|---|
OPENAI_API_KEY |
OpenAI API key for GPT-4 |
DEEPGRAM_API_KEY |
Deepgram API key for speech recognition |
ELEVENLABS_API_KEY |
ElevenLabs API key for text-to-speech |
TWILIO_ACCOUNT_SID |
Twilio Account SID |
TWILIO_AUTH_TOKEN |
Twilio Auth Token |
- External service (Twilio) makes POST request to
/
- Application returns TwiML with dynamically generated WebSocket URL
- WebSocket connection established at
/ws
- Audio pipeline processes:
- Incoming audio → Speech-to-Text
- Text → AI processing
- AI response → Text-to-Speech
- Audio response sent back
cerebrium delete cerebrium-demo
cd infrastructure
tofu destroy