Answer WhatsApp voice calls with AI. This repo connects Kapso's WhatsApp infrastructure to Pipecat's voice pipeline using OpenAI for speech and text.
The demo agent speaks neutral Latin American Spanish and introduces Kapso's platform.
- Python 3.10+ with uv installed
- Pipecat Cloud account
- Kapso account with a WhatsApp number (calls enabled)
- OpenAI API key
- Docker Hub account (or another registry)
uv run pcc auth loginCreate a Pipecat Cloud secret set:
uv run pcc secrets set kapso-voice-secrets OPENAI_API_KEY=sk-...uv run pcc credentials docker create my-docker-secret \
--username YOUR_DOCKERHUB_USERNAME \
--password YOUR_DOCKER_TOKENEdit pcc-deploy.toml and update the image tag to YOUR_DOCKERHUB_USERNAME/agent-name:VERSION, then:
uv run pcc docker build-pushuv run pcc deploy kapso-voice YOUR_DOCKERHUB_USERNAME/agent-name:VERSION --credentials my-docker-secretUpdate pcc-deploy.toml with your agent name (kapso-voice) and secret set name (kapso-voice-secrets).
Full setup guide: Kapso voice agent quickstart
- Sign in to app.kapso.ai
- Go to Voice agents → New voice agent
- Set provider to Pipecat
- Paste your Pipecat public API key and agent name (
kapso-voice) - Assign a WhatsApp number and mark it Primary + Enabled
- Call the number to test
Edit bot.py:
- System prompt: Change
SYSTEM_PROMPT_FALLBACKor setSYSTEM_PROMPTenv var - Voice models: Swap OpenAI services in
run_voice_pipelinefor other Pipecat-supported providers - Idle timeout: Adjust
idle_timeout_secsin thePipelineTaskconstructor
uv syncCopy .env.example to .env and add your OPENAI_API_KEY.
- Kapso receives WhatsApp voice call webhook from Meta
- Kapso forwards webhook to Pipecat Cloud with
{kind: "whatsapp_connect", webhook, whatsapp_token, phone_number_id, context} - Pipecat launches
bot.pyand connects to WhatsApp via SmallWebRTC transport - Audio flows: caller speech → OpenAI STT → GPT-4 → OpenAI TTS → caller
- Call ends on 30s idle timeout or disconnect
Kapso includes a context object with each call containing:
- project: Your Kapso project info (name, ID)
- config: WhatsApp number details (display name, phone number ID, mode)
- contact: Caller profile (name, wa_id)
- call: Call metadata (direction, status, timestamps)
- call_permission: Permission status and expiry
- conversation: Full message history (all messages with timestamps and content)
The agent uses this context to personalize greetings and responses. Check build_context_prompt() in bot.py to see how it's formatted.
No audio back: Check OPENAI_API_KEY is set in Pipecat secrets and view Pipecat logs for TTS errors
Build fails: Verify Docker credentials with docker login and retry with --debug flag
Call doesn't connect: Confirm WhatsApp number has calls enabled in Kapso and voice agent assignment is marked Primary
BSD 2-Clause