A real-time voice bot that integrates with WhatsApp Business API to handle voice calls using WebRTC technology. Users can call your WhatsApp Business number and have natural conversations with an AI-powered bot.
- Overview
- Architecture
- Prerequisites
- Installation
- Configuration
- Running the Server
- API Endpoints
- Environment Variables
- Troubleshooting
- MCP Integration
- Development
- License
This project enables real-time voice conversations with WhatsApp users through WebRTC technology. The bot handles incoming WhatsApp calls, processes user speech using STT, processes queries with an LLM, and responds using TTS. The system is built on the Pipecat framework for real-time voice and multimodal AI agents.
- Real-time voice conversations with WhatsApp users
- WebRTC technology for voice communication
- Integration with AI services for conversation handling
- Support for STT (Speech-to-Text), LLM (Large Language Model), and TTS (Text-to-Speech)
- MCP (Model Context Protocol) integration for extended tool access
- Automatic greeting and order-taking capabilities
- Stock checking and order placement functionality
The application follows a modular architecture:
- Server Layer: FastAPI application that handles WhatsApp webhook events and manages WebRTC connections
- Transport Layer: SmallWebRTCTransport handles WebRTC connection establishment and management
- AI Layer: Uses OpenAI-compatible LLM for conversation processing and MCP for tool integration
- Audio Processing: Uses Deepgram STT, Cartesia TTS, and Silero VAD for voice activity detection
server.py: Handles WhatsApp webhook events and manages WebRTC connectionsbot.py: Implements the conversation pipeline using Pipecatprompt.py: Contains the system and session instructions for the AI agentenv.example: Example environment variables file
- Facebook Account: Create an account at facebook.com
- Facebook Developer Account: Create an account at developers.facebook.com
- WhatsApp Business App: Create a new WhatsApp Business API application
- Phone Number: Add and verify a WhatsApp Business phone number
- Business Verification: Complete business verification process (required for production only)
- Webhook Configuration: Set up webhook endpoint for your application
Important Note: For production, make sure your WhatsApp Business account has access to this feature.
Find more details here:
Your WhatsApp Business phone number must be configured to accept voice calls[2]:
For development, you'll be provided with a free test phone number valid for 90 days.
- Go to your WhatsApp Business API dashboard in Meta Developer Console
- Navigate to Configuration → Phone Numbers → Manage phone numbers
- Select your phone number
- In the Calls tab, enable "Allow voice calls" capability
- Save the configuration
Set up your webhook endpoint in the Meta Developer Console[3]:
- Go to WhatsApp → Configuration → Webhooks
- Set callback URL:
https://your-domain.com/ - Set verify token:
your_webhook_verification_token- This token should match your
WHATSAPP_WEBHOOK_VERIFICATION_TOKENenvironment variable
- This token should match your
- Click "Verify and save"
- In the webhook fields below, select:
calls(required for voice call events)
- Go to WhatsApp → API Setup
- Click "Generate access token"
- Use this token for your
WHATSAPP_TOKENenvironment variable
- Use this token for your
- Note your Phone Number ID - you'll need this for
PHONE_NUMBER_IDconfiguration
- Python 3.10 or newer
uvpackage manager (https://docs.astral.sh/uv/)
-
Clone the repository:
git clone <repository-url> cd whatsapp
-
Install dependencies:
uv sync
-
Activate the virtual environment:
source .venv/bin/activate
-
Copy the example environment file:
cp env.example .env
-
Edit
.envfile and add your API keys and configuration values.
- OpenRouter API Key: Required for LLM service. Get it from OpenRouter.
- Used as
OPENROUTER_API_KEYin environment - Used with the google/gemini-2.0-flash-lite-001 model by default
- Used as
- Deepgram API Key: Required for Speech-to-Text service. Get it from Deepgram.
- Used as
DEEPGRAM_API_KEYin environment
- Used as
- Cartesia API Key: Required for Text-to-Speech service. Get it from Cartesia.
- Used as
CARTESIA_API_KEYin environment
- Used as
- WhatsApp Business API Token: Required for WhatsApp integration.
- Used as
WHATSAPP_TOKENin environment
- Used as
- WhatsApp Webhook Verification Token: Required for webhook verification.
- Used as
WHATSAPP_WEBHOOK_VERIFICATION_TOKENin environment
- Used as
- WhatsApp Phone Number ID: Your WhatsApp Business phone number ID.
- Used as
WHATSAPP_PHONE_NUMBER_IDin environment
- Used as
To use MCP for extended tool access:
- MCP HTTP URL: Optional. Provide an HTTP URL for your MCP server. The
MCP_HTTP_URLenvironment variable should be set to your MCP server URL.
python server.pyThe server will start and listen for incoming WhatsApp webhook events.
By default, the server will run on localhost:7860. You can specify a different host and port:
python server.py --host 0.0.0.0 --port 8080Add -v flag for verbose logging:
python server.py -v- Find your WhatsApp test number in the Meta Developer Console
- Call the number from your WhatsApp app
- The bot should answer and engage in conversation
Handles WhatsApp webhook verification requests from Meta.
- Description: Used during webhook setup to verify the endpoint
- Parameters: Verification token and challenge as query parameters
- Response: Challenge string for verification
Handles incoming WhatsApp webhook events.
- Description: Processes incoming WhatsApp messages and call events
- Body: WhatsApp webhook request payload
- Response: Success confirmation with processing status
| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY |
Yes | API key for OpenRouter |
DEEPGRAM_API_KEY |
Yes | API key for Deepgram STT service |
CARTESIA_API_KEY |
Yes | API key for Cartesia TTS service |
WHATSAPP_TOKEN |
Yes | WhatsApp Business API access token |
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN |
Yes | Token for webhook verification |
WHATSAPP_PHONE_NUMBER_ID |
Yes | WhatsApp Business phone number ID |
MCP_HTTP_URL |
No | Optional HTTP URL for MCP server to add tools to the AI agent |
- Verify the
WHATSAPP_WEBHOOK_VERIFICATION_TOKENmatches the one in Meta Developer Console - Ensure your webhook URL is publicly accessible (consider using ngrok for local development)
-
Bot Not Responding
- Check that all API keys are correctly configured in the
.envfile - Verify that voice calling is enabled for your WhatsApp Business number
- Ensure your business account is verified for production use
- Check that all API keys are correctly configured in the
-
Dependency Installation Issues
- Ensure you have
uvpackage manager installed - Run
uv syncto install all dependencies - Activate the virtual environment with
source .venv/bin/activate
- Ensure you have
-
WebRTC Connection Issues
- Check that your server can handle WebSocket connections
- Verify that your domain is accessible via HTTPS (required for WebRTC)
Add the -v flag when running the server for more detailed logs:
python server.py -vThis will enable TRACE-level logging which helps identify issues with the connection flow.
This bot supports MCP (Model Context Protocol) for extended tool access. If the MCP_HTTP_URL environment variable is set, the bot will use the MCP server to access custom tools.
MCP allows the bot to access custom tools and services that extend its capabilities. The current implementation includes:
- doolally_knowledge_base: For answering questions about menu items, timings, location, or FAQs
- check_menu_stock: For verifying if a specific item is in stock and confirming the latest price
- place_order: For recording a confirmed order in the Orders tab
whatsapp/
├── bot.py # Conversation pipeline implementation
├── server.py # FastAPI server for handling webhooks
├── prompt.py # System and session instructions for the AI agent
├── env.example # Example environment variables
├── pyproject.toml # Project dependencies and configuration
├── README.md # This file
└── ...
- Make sure you're in the project directory
- Activate the virtual environment:
source .venv/bin/activate - Make your changes
- Test the changes by running:
python server.py
This project uses the Pipecat framework for building real-time voice agents. The pipeline includes:
- Input transport (WebRTC connection)
- STT (Speech-to-Text)
- LLM (Large Language Model)
- TTS (Text-to-Speech)
- Output transport (WebRTC connection)
- WhatsApp Cloud API Getting Started
- Voice Calling API Documentation
- Webhook Configuration Guide
- SDP Overview and Samples
- Pipecat Framework Documentation
This project is licensed under the BSD 2-Clause License - see the LICENSE file for details.
- Voice calling feature requires WhatsApp Business API access
- Test numbers are valid for 90 days in development mode
- Production deployment requires business verification
- The bot is configured as "Reva" for Doolally Taproom with specific conversation flows and prompts