Skip to content

A fully functional conversational AI system for restaurant phone ordering using Twilio, Whisper, LLM, and RAG

License

Notifications You must be signed in to change notification settings

KOKOSde/voice-ai-ordering-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ• Voice AI Ordering Agent

A fully functional conversational AI system for restaurant phone ordering

Python 3.10+ FastAPI License: MIT

This project demonstrates an end-to-end voice AI ordering system for restaurants. Customers can call a phone number, have a natural conversation with an AI agent to browse the menu, place orders, and simulate payment โ€” all through voice.

๐ŸŽฏ Features

  • ๐Ÿ“ž Twilio Voice Integration - Handle incoming phone calls with Twilio Programmable Voice
  • ๐ŸŽค Speech-to-Text - Transcribe customer speech using OpenAI Whisper
  • ๐Ÿง  LLM Processing - Natural language understanding with Mistral/Qwen2-Audio via Hugging Face
  • ๐Ÿ” RAG Menu Search - Semantic search over menu items using FAISS vector database
  • ๐Ÿ—ฃ๏ธ Text-to-Speech - Generate voice responses with gTTS or ElevenLabs
  • ๐Ÿ’ณ Simulated POS/Payments - Mock Stripe integration and order management
  • ๐Ÿ”„ Multi-turn Conversations - Session state management with Redis/in-memory storage
  • ๐Ÿ“Š Analytics & Logging - Call transcripts, order analytics, and error tracking

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Phone Call    โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   Twilio Voice   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  FastAPI Server โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                          โ”‚
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚                                 โ–ผ                                 โ”‚
                        โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
                        โ”‚  โ”‚   Whisper    โ”‚    โ”‚   LLM (Mistral)  โ”‚    โ”‚   FAISS RAG    โ”‚  โ”‚
                        โ”‚  โ”‚   (STT)      โ”‚โ”€โ”€โ”€โ–ถโ”‚   Intent + NLU   โ”‚โ—€โ”€โ”€โ–ถโ”‚   Menu Search  โ”‚  โ”‚
                        โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
                        โ”‚                                 โ”‚                                 โ”‚
                        โ”‚                                 โ–ผ                                 โ”‚
                        โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
                        โ”‚  โ”‚  gTTS/11Labs โ”‚    โ”‚  Session Manager โ”‚    โ”‚   SQLite DB    โ”‚  โ”‚
                        โ”‚  โ”‚   (TTS)      โ”‚    โ”‚   (Redis/Mem)    โ”‚    โ”‚   Orders/Logs  โ”‚  โ”‚
                        โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
                        โ”‚                                                                   โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Quick Start

Prerequisites

1. Clone the Repository

git clone https://github.com/KOKOSde/voice-ai-ordering-agent.git
cd voice-ai-ordering-agent

2. Set Up Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Configure Environment Variables

cp .env.example .env
# Edit .env with your credentials

Required variables:

TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
HUGGINGFACE_TOKEN=your_hf_token

4. Start the Server

uvicorn main:app --reload --host 0.0.0.0 --port 8000

5. Expose with ngrok

In a new terminal:

ngrok http 8000

Copy the ngrok URL (e.g., https://abc123.ngrok.io)

6. Configure Twilio Webhook

  1. Go to Twilio Console
  2. Navigate to Phone Numbers โ†’ Manage โ†’ Active Numbers
  3. Click on your phone number
  4. Under Voice & Fax, set:
    • A CALL COMES IN: Webhook โ†’ https://your-ngrok-url.ngrok.io/voice
    • HTTP POST
  5. Save

7. Make a Test Call! ๐Ÿ“ž

Call your Twilio phone number and start ordering!

๐Ÿ“ Project Structure

voice-ai-ordering-agent/
โ”œโ”€โ”€ main.py                 # FastAPI application & Twilio webhooks
โ”œโ”€โ”€ menu.json              # Restaurant menu database
โ”œโ”€โ”€ prompts.py             # LLM prompt templates
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”œโ”€โ”€ .env.example           # Environment variables template
โ”œโ”€โ”€ .gitignore            # Git ignore patterns
โ”œโ”€โ”€ Procfile              # Heroku deployment
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ transcription.py  # Whisper speech-to-text
โ”‚   โ”œโ”€โ”€ rag.py            # FAISS vector search for menu
โ”‚   โ”œโ”€โ”€ tts.py            # Text-to-speech (gTTS/ElevenLabs)
โ”‚   โ”œโ”€โ”€ llm.py            # LLM processing (Mistral/OpenAI)
โ”‚   โ”œโ”€โ”€ session.py        # Conversation state management
โ”‚   โ”œโ”€โ”€ database.py       # SQLite order storage
โ”‚   โ””โ”€โ”€ payment.py        # Simulated payment processing
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ test_main.py      # Unit tests
โ””โ”€โ”€ .github/
    โ””โ”€โ”€ workflows/
        โ””โ”€โ”€ ci.yml        # GitHub Actions CI

๐ŸŽฎ Demo Script

Here's a sample conversation flow:

๐Ÿ“ž *Phone rings*

๐Ÿค– AI: "Welcome to Bella's Italian Kitchen! I'm your AI ordering assistant. 
        I can help you browse our menu and place an order. 
        What would you like today?"

๐Ÿ‘ค Customer: "What pizzas do you have?"

๐Ÿค– AI: "We have some great pizzas! Our most popular is the Margherita at $14.99, 
        the classic Pepperoni at $16.99, and our chef's special Truffle Mushroom 
        at $22.99. Would you like to hear more about any of these?"

๐Ÿ‘ค Customer: "I'll have a large pepperoni pizza"

๐Ÿค– AI: "Great choice! I've added a large Pepperoni Pizza for $24.99 to your order. 
        Would you like anything else? Maybe some garlic bread or a drink?"

๐Ÿ‘ค Customer: "Add garlic bread and that's it"

๐Ÿค– AI: "Perfect! I've added Garlic Bread for $5.99. Let me confirm your order:
        - Large Pepperoni Pizza: $24.99
        - Garlic Bread: $5.99
        Your total is $30.98 plus tax. Is that correct?"

๐Ÿ‘ค Customer: "Yes, that's correct"

๐Ÿค– AI: "Your order has been confirmed! Your order number is BIK-A7B3C2D1. 
        Your total is $33.51 including tax. Thank you for ordering with 
        Bella's Italian Kitchen! Have a great day!"

๐Ÿ”ง Configuration Options

LLM Models

Model Description Memory Required
mistralai/Mistral-7B-Instruct-v0.2 Default, good balance ~16GB
Qwen/Qwen2-Audio-7B-Instruct Audio-native model ~16GB
distilgpt2 Local testing only ~500MB
OpenAI API Use OPENAI_API_KEY Cloud-based

TTS Options

Engine Quality Cost
gTTS Good Free
ElevenLabs Excellent Paid API
pyttsx3 Basic (offline) Free

Session Storage

Backend Use Case
In-memory Development, single instance
Redis Production, distributed

๐Ÿšข Deployment

Heroku

# Login to Heroku
heroku login

# Create app
heroku create your-app-name

# Set environment variables
heroku config:set TWILIO_ACCOUNT_SID=xxx
heroku config:set TWILIO_AUTH_TOKEN=xxx
heroku config:set HUGGINGFACE_TOKEN=xxx

# Deploy
git push heroku main

Docker

FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Railway / Render

Both platforms auto-detect Python projects. Just connect your GitHub repo!

๐Ÿ“Š API Endpoints

Endpoint Method Description
/ GET Health check
/voice POST Twilio voice webhook
/voice/process POST Process speech input
/sms POST SMS webhook
/menu GET Get full menu JSON
/order/{id} GET Get order status
/analytics GET Call/order analytics

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

# Run specific test file
pytest tests/test_main.py -v

๐Ÿ“ˆ Analytics Dashboard

The system logs all interactions for analytics:

  • Call duration and outcomes
  • Popular menu items
  • Order values and trends
  • Customer satisfaction signals

Access via /analytics endpoint.

๐Ÿ›ก๏ธ Security Considerations

  • Never commit .env files
  • Use Twilio request validation in production
  • Rate limit API endpoints
  • Sanitize all user inputs
  • Use HTTPS in production

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments


About

A fully functional conversational AI system for restaurant phone ordering using Twilio, Whisper, LLM, and RAG

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published