๐๏ธ Enterprise-Grade Voice AI Platform
Build conversational voice assistants with real-time STT, LLM, and TTS
DEMO-EXAMPLES.mp4
๐๏ธ This demo showcases Sunona's voice assistants โ
simple_assistant.py,voice_assistant.py, andtext_only_assistant.pywith real-time STT, LLM, and TTS.
demo-how-to-create-agent-make-calls.mp4
๐ This demo showcases Sunona's Twilio integration โ an AI campus recruiter (Priya) making real phone calls with voice conversation.
1. Environment Variables (.env):
# LLM (Brain) - Primary: Groq (fastest), Fallback: OpenRouter
GROQ_API_KEY=gsk_xxxxxxxx # https://console.groq.com/keys
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxx # https://openrouter.ai/keys (fallback)
# STT (Ears) - https://console.deepgram.com/
DEEPGRAM_API_KEY=xxxxxxxx
# TTS (Voice) - https://elevenlabs.io/app/settings/api-keys
ELEVENLABS_API_KEY=xxxxxxxx
# Telephony - https://www.twilio.com/console
TWILIO_ACCOUNT_SID=ACxxxxxxxx
TWILIO_AUTH_TOKEN=xxxxxxxx
TWILIO_PHONE_NUMBER=+1xxxxxxxxxx
TWILIO_WEBHOOK_URL=https://your-ngrok-url.ngrok-free.app
# Ngrok (for local testing) - https://dashboard.ngrok.com/
NGROK_AUTH_TOKEN=xxxxxxxx2. Agent Configuration Used:
- Config:
agent_data/example_recruiter/config_minimal.json - Agent Name: Priya (Campus Recruiter)
- Providers: Deepgram Nova-2 (STT) โ LLM โ ElevenLabs Turbo (TTS)
3. Run the Demo (Step-by-Step):
๐ก All scripts use curl for API testing โ no Postman required!
# Step 1: Start ngrok tunnel (Terminal 1)
ngrok http 8000
# Step 2: Start Sunona server (Terminal 2)
python -m sunona.server
# Step 3: Health check - verify server is running (Terminal 3)
.\scripts\test_api.bat
# Step 4: View agent details, prompt, users, etc.
.\scripts\view_agent.bat
# Step 5: Create agent - โ ๏ธ COPY THE UNIQUE agent_id FROM OUTPUT!
.\scripts\create_agent.bat
# Step 6: Make call via Twilio - paste agent_id when prompted
.\scripts\make_call.bat| Step | Script | Purpose |
|---|---|---|
| 1 | ngrok http 8000 |
Expose local server to internet |
| 2 | python -m sunona.server |
Start the Sunona API server |
| 3 | test_api.bat |
Health check - verify server connection |
| 4 | view_agent.bat |
View agent config, prompt, and users |
| 5 | create_agent.bat |
Create agent โ copy the agent_id |
| 6 | make_call.bat |
Make Twilio call with the agent |
๏ฟฝ See
scripts/README.mdfor detailed script documentation.
| Category | Capabilities |
|---|---|
| ๐ค Speech-to-Text | 11 providers: Deepgram, Whisper, Groq, AssemblyAI, Azure, Sarvam, ElevenLabs, Gladia, Pixa, Smallest, AWS |
| ๐ง LLM | 100+ models via LiteLLM: OpenRouter (FREE), OpenAI, Anthropic, Groq, Gemini, Azure, Mistral |
| ๐ Text-to-Speech | 11 providers: Edge TTS (FREE), ElevenLabs, OpenAI, Deepgram, Cartesia, Rime, Smallest, Sarvam, PlayHT, Azure, Polly |
| ๐ Telephony | 7 providers: Twilio, Plivo, Exotel, Vonage, SignalWire, Telnyx, Bandwidth |
| ๐ค AI Agents | 7 types: Contextual, Extraction, Graph, Knowledge Base, Webhook, Summarization, Adaptive |
| ๐ Knowledge Base | Universal builder: Website, PDF, DOCX, TXT, JSON, CSV with auto-agent generation |
| ๐ Smart Transfer | Intelligent call transfer to humans when AI can't answer |
| ๐ณ Billing | Pay-as-you-go, auto-pay, usage metering, balance warnings, non-blocking SMTP notifications |
| ๐ก๏ธ Security | Multi-tenant isolation, O(1) auth lookups, organization-scoped resource gating, secured SSO |
| โก Resilience | Hardened VAD, circuit breakers for LLM streams, persistent Redis AgentStore, graceful WebSockets |
| ๐๏ธ WebRTC | Fully bidirectional browser calling with ultra-low latency audio response feedback |
| ๐ Languages | 20+ languages including Hindi, Tamil, Telugu, Bengali (via Sarvam AI) |
| ๐ก๏ธ Content Safety | Multilingual profanity detection (30+ languages) with empathetic responses |
cd sunona
# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate # Windows
# source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt
pip install -e .cp .env.example .env
notepad .env # Add your API keysMinimum required keys:
# LLM (choose one or both)
GROQ_API_KEY=gsk_xxxxxxxx # Fastest (free tier available)
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxx # FREE models available
# Speech-to-Text
DEEPGRAM_API_KEY=xxxxxxxx
# Text-to-Speech (optional - Edge TTS is FREE and built-in!)
ELEVENLABS_API_KEY=xxxxxxxx# Text-only (LLM only)
python examples/text_only_assistant.py
# ๐๏ธ Voice assistant (STT + LLM + TTS) - RECOMMENDED
python examples/simple_assistant.py
# Twilio call server (phone calls!)
python examples/twilio_call_server.pyThe simple_assistant.py is a complete hands-free voice assistant with:
| Feature | Description |
|---|---|
| ๐ค VAD | Voice Activity Detection - auto-detects speech |
| ๐ STT | Deepgram Nova-2 for accurate transcription |
| ๐ง LLM | Groq (fastest) + OpenRouter fallback |
| ๐ TTS | Edge TTS (FREE, unlimited, 17+ languages!) |
| ๐ Multilingual | Auto-detects language and speaks in matching voice |
| ๐ Content Safety | Profanity detection with empathetic responses |
| โก Low Latency | Optimized for fast, natural conversation |
4. ๐ ๏ธ Quick Start Scripts (Recommended) API ROUTES BASED TESTS(conversational_details.json, config.json, config_minimal.json, users.json)
Use these scripts from the root directory to interact with the Sunona API easily. They handle authentication and URL encoding for you.
start the python sunona server : python -m sunona.server dont close this : python -m sunona.server it has intigrated server with twilio server for call services using api's
| Script | Purpose | Command |
|---|---|---|
| Health Check | Verify server connection | .\scripts\test_api.bat |
| Create Agent | Register a new AI agent | .\scripts\create_agent.bat |
| Make Call | Initiate a phone call | .\scripts\make_call.bat |
Tip
These scripts are compatible with both CMD and PowerShell and use curl.exe to avoid alias conflicts.
Sunona includes 7 specialized agent types for different use cases:
| Agent | Use Case | Key Features |
|---|---|---|
| ContextualAgent | General conversation | Deep context tracking, sentiment awareness, topic management |
| ExtractionAgent | Lead capture, appointments | Extracts names, emails, phones, dates with validation |
| GraphAgent | IVR menus, guided flows | Node-based flows with conditions and actions |
| KnowledgeBaseAgent | FAQ, customer support | RAG-powered answers from your content |
| WebhookAgent | CRM integration | Real-time external system integration |
| SummarizationAgent | Call summaries | Post-call summaries and action items |
| AdaptiveAgent | Dynamic conversations | Auto-switches between modes based on context |
from sunona.agents import select_agent
# Auto-select based on use case
agent = select_agent(use_case="lead_capture")
# Auto-detect from first message
agent = select_agent(first_message="I want to book an appointment")
# With knowledge base for FAQ
agent = select_agent(
use_case="faq",
knowledge_base=my_knowledge_base,
)Build AI agents from ANY content source automatically:
from sunona.knowledge import UniversalKnowledgeBuilder
builder = UniversalKnowledgeBuilder("Acme Corp")
# Add from multiple sources
await builder.add_website("https://acme.com")
builder.add_text("Our hours are 9am-5pm Monday to Friday")
await builder.add_file("products.pdf")
await builder.add_file("faq.docx")
builder.add_faq([
{"question": "What are your hours?", "answer": "9am-5pm Mon-Fri"}
])
# Build knowledge base
knowledge = builder.build()
# Auto-generate AI agent
agent_config = builder.generate_agent(knowledge, "Acme Assistant")| Source | Features |
|---|---|
| ๐ Website URLs | Auto-scrapes, extracts contact info, FAQ |
| ๐ PDF documents | Text extraction from all pages |
| ๐ Word documents | .docx support |
| ๐ Text files | .txt support |
| ๐ JSON files | Structured data parsing |
| ๐ CSV files | Tabular data import |
| โ Direct FAQ | Question/answer pairs |
| ๐๏ธ Product catalogs | Name, description, pricing |
Seamlessly transfer calls to humans when needed:
from sunona.telephony import create_call_handler, TransferConfig
# Configure transfer
handler = create_call_handler(
transfer_number="+1234567890",
knowledge_base=my_knowledge,
agent_name="John",
)
# Process messages
result = await handler.process_message("What's your refund policy?")
if result["transfer"]:
# Seamless handoff to human
print(result["transfer_action"])| Trigger | When It Happens |
|---|---|
| ๐ Out-of-context | AI doesn't know the answer (2+ times) |
| ๐ค Customer request | "Talk to a human", "Get me a manager" |
| Refunds, complaints, billing issues | |
| ๐ค Frustration | "This is useless", "Not helpful" |
| โฑ๏ธ Low confidence | AI confidence drops below threshold |
Complete SaaS billing with wallet balance, auto-pay, and usage tracking:
| Service | Rate |
|---|---|
| STT (Deepgram Nova-2) | $0.0145/min |
| LLM (GPT-4o-mini) | $0.00015/1K tokens |
| LLM (OpenRouter Free) | FREE |
| TTS (ElevenLabs) | $0.18/1K chars |
| Telephony (Twilio) | $0.022/min |
| Platform Fee | $0.01/min |
| Balance | Level | Action |
|---|---|---|
| > $50 | โ Healthy | No action |
| $20-50 | ๐ก Moderate | Daily reminder |
| $10-20 | Warning every 4 hours | |
| $5-10 | ๐จ Critical | Warning every hour |
| < $5 | โ Depleted | Service blocked |
from sunona.billing import send_balance_warning
# Send notification when balance is low
await send_balance_warning(
account_id="acc_123",
email="user@example.com",
balance=15.00,
warning_level="low",
webhook_url="https://your-app.com/webhook",
)When auto-pay is enabled and balance drops below threshold:
- Card is automatically charged
- Wallet is topped up
- Email confirmation sent
- Service continues uninterrupted
# Terminal 1: Start ngrok
ngrok http 8000
# Terminal 2: Start server
python examples/twilio_call_server.py
# Terminal 3: Make a call (easiest way)
.\scripts\make_call.bat +917075xxxxxx <agent_id>
# Or via manual POST request (requires URL encoding for +)
Invoke-RestMethod -Method POST -Uri "http://localhost:8000/make-call?to=%2B917075xxxxxx&agent_id=your_id"| Provider | Cost/min | Best For |
|---|---|---|
| Twilio | $0.022 | General use, most reliable |
| Plivo | $0.015 | Budget option |
| Exotel | $0.02 | India-focused |
| Vonage | $0.018 | Enterprise |
| SignalWire | $0.010 | Cheapest |
| Telnyx | $0.012 | Developer-friendly |
| Bandwidth | $0.016 | Enterprise |
| Provider | Model | Cost/min | Languages |
|---|---|---|---|
| Deepgram | Nova-2 | $0.0145 | 35+ |
| Groq | Whisper Large V3 | $0.006 | 100+ |
| Sarvam | Saarika | $0.01 | Indian languages |
| ElevenLabs | Scribe | $0.015 | 25+ |
| Gladia | Whisper | $0.01 | 50+ |
| Smallest | Lightning | $0.005 | 10+ |
| AssemblyAI | Default | $0.015 | 20+ |
| Azure | Speech | $0.016 | 80+ |
| AWS | Transcribe | $0.024 | 30+ |
| Provider | Model | Cost/1K tokens |
|---|---|---|
| OpenRouter | Mistral 7B | FREE |
| OpenRouter | GPT-4o-mini | $0.00015 |
| OpenAI | GPT-4o | $0.005 |
| Groq | Llama 3.1 70B | $0.0006 |
| Anthropic | Claude 3.5 Sonnet | $0.003 |
| Gemini 1.5 Pro | $0.00125 | |
| Mistral | Mistral Large | $0.002 |
| Azure | GPT-4 | $0.006 |
| Provider | Cost/1K chars | Best For |
|---|---|---|
| Edge TTS | FREE | Built-in, 17+ languages, unlimited |
| ElevenLabs | $0.18 | Highest quality, voice cloning |
| OpenAI | $0.015 | Good quality, reliable |
| Deepgram Aura | $0.0065 | Low latency |
| Rime | $0.10 | Fast, neural |
| Smallest | $0.05 | Ultra-cheap |
| Sarvam | $0.08 | Indian languages |
| Cartesia | $0.10 | Low latency |
| PlayHT | $0.15 | Voice cloning |
| Azure | $0.016 | Enterprise |
| AWS Polly | $0.004 | Cheapest |
First-class support for Indian languages via Sarvam AI:
- ๐ฎ๐ณ Hindi (hi-IN)
- ๐ฎ๐ณ Tamil (ta-IN)
- ๐ฎ๐ณ Telugu (te-IN)
- ๐ฎ๐ณ Bengali (bn-IN)
- ๐ฎ๐ณ Kannada (kn-IN)
- ๐ฎ๐ณ Malayalam (ml-IN)
- ๐ฎ๐ณ Marathi (mr-IN)
- ๐ฎ๐ณ Gujarati (gu-IN)
- ๐ฎ๐ณ Punjabi (pa-IN)
from sunona.transcriber import create_transcriber
from sunona.synthesizer import create_synthesizer
# Hindi STT
transcriber = create_transcriber("sarvam", language="hi-IN")
# Hindi TTS
synthesizer = create_synthesizer("sarvam", language="hi-IN")Detect and handle profanity with empathy across 30+ languages:
Supports abuse detection in English, Spanish, French, German, Russian, Italian, Portuguese, Polish, Dutch, Turkish, Japanese, Chinese, Hindi, Arabic, Thai, Vietnamese, Korean, Swedish, Norwegian, Danish, Finnish, Greek, and more.
from better_profanity import profanity
# Automatically loaded
profanity.load_censor_words()
# Detect abuse in ANY language
if profanity.contains_profanity(transcribed_text):
# Respond with empathy
response = random_sympathetic_response() # 10 unique variations
print(f"๐ก๏ธ Content Alert: Abusive language detected")โ
Detects profanity across 30+ languages simultaneously
โ
Recognizes contextual variations (fck off, fucing, etc.)
โ
Responds with one of 10 unique sympathetic responses
โ
Detailed logging for monitoring and compliance
โ
Conversation continues respectfully
โ
No false positives for innocent words (e.g., "assassin")
- "I'm sorry you're feeling this way. I'm here to help and support you..."
- "I understand you're upset, and I'm truly sorry about that..."
- "I'm sorry, I can't engage with that kind of language. But I genuinely care..."
- "Hey, I can tell something's really bothering you. I'm sorry you're struggling..."
- "...and 6 more unique empathetic responses"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SUNONA VOICE AI โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ TWILIO โ โ PLIVO โ โ EXOTEL โ Telephony โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ โ
โ โผ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ HARDENED CORE (Production Ready) โโ
โ โ Circuit Breakers โ Graceful Failover โ O(1) Auth Lookups โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ BILLING & MULTI-TENANCY SYSTEM โโ
โ โ Balance Check โ Usage Meter โ Tenant Registry โ Auto-Pay โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ PERSISTENCE & NOTIFICATIONS โโ
โ โ Redis AgentStore โ aiosmtplib Email โ Webhook Alerts โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Sunona core has undergone a comprehensive production audit to ensure high reliability:
- Persistent AgentStore: Switched from in-memory to a Redis-backed storage layer for enterprise-grade availability and state persistence.
- Recursive Deadlock Prevention: Switched to
RLockfor all financial and state transactions. - O(1) Authentication: Hash-indexed API key validation for sub-millisecond overhead.
- Circuit Breaker Pattern: Automatic fallback and fail-fast logic for all LLM and STT provider streams.
- Non-Blocking Notifications: High-performance SMTP delivery via
aiosmtpliband async webhooks. - Bidirectional WebRTC: Restored the audio response feedback loop for seamless browser-based voice interactions.
sunona/
โโโ sunona/ # Main package
โ โโโ agents/ # ๐ค AI Agents (7 types)
โ โ โโโ base_agent.py
โ โ โโโ extraction_agent.py
โ โ โโโ graph_agent.py
โ โ โโโ knowledgebase_agent.py
โ โ โโโ webhook_agent.py
โ โ โโโ summarization_agent.py
โ โ โโโ agent_selector.py # Smart auto-selection
โ โโโ llms/ # ๐ง LLM providers (100+ models)
โ โ โโโ litellm_llm.py
โ โโโ transcriber/ # ๐ค STT providers (11)
โ โ โโโ deepgram_transcriber.py
โ โ โโโ groq_transcriber.py
โ โ โโโ sarvam_transcriber.py
โ โ โโโ ...
โ โโโ synthesizer/ # ๐ TTS providers (10)
โ โ โโโ elevenlabs_synthesizer.py
โ โ โโโ rime_synthesizer.py
โ โ โโโ sarvam_synthesizer.py
โ โ โโโ ...
โ โโโ telephony/ # ๐ Telephony (7 providers)
โ โ โโโ twilio_handler.py
โ โ โโโ plivo_handler.py
โ โ โโโ smart_transfer.py # Intelligent handoff
โ โโโ knowledge/ # ๐ Knowledge Base
โ โ โโโ knowledge_builder.py
โ โ โโโ website_builder.py
โ โโโ billing/ # ๐ณ Billing System
โ โ โโโ billing_manager.py
โ โ โโโ balance_warning.py # $20 threshold warnings
โ โ โโโ notifications.py # Email/webhook alerts
โ โ โโโ middleware.py
โ โโโ input_handlers/ # ๐ฅ Audio input
โ โโโ output_handlers/ # ๐ค Audio output
โ โโโ models.py # Pydantic models
โ โโโ constants.py # Configuration
โ โโโ providers.py # Provider registry
โโโ examples/
โ โโโ twilio_call_server.py
โ โโโ TWILIO_QUICKSTART.md
โโโ .env.example # All environment variables
โโโ requirements.txt
See .env.example for all available variables. Key categories:
| Category | Variables |
|---|---|
| LLM | OpenRouter, OpenAI, Anthropic, Google, Groq, Azure, Mistral |
| STT | Deepgram, AssemblyAI, Sarvam, Gladia, Pixa, Smallest, Azure, AWS |
| TTS | ElevenLabs, Rime, Cartesia, PlayHT, Azure, AWS Polly |
| Telephony | Twilio, Plivo, Exotel, Vonage, SignalWire, Telnyx, Bandwidth |
| Database | PostgreSQL, Redis |
| Vector Stores | ChromaDB, Pinecone, Qdrant |
| SMTP settings for notifications | |
| Billing | Stripe integration |
| Feature | Sunona | Competitors |
|---|---|---|
| Platform Fee | $0.01/min | $0.02-0.05/min |
| Free LLM Options | โ OpenRouter | โ No |
| Indian Languages | โ Sarvam AI | โ Limited |
| Smart Transfer | โ Included | โ Extra cost |
| Knowledge Builder | โ Universal | โ Basic |
| Auto Agent | โ Yes | โ No |
| Balance Warnings | โ Email + Webhook | โ No |
| Auto-Pay | โ Yes | โ Limited |
Sunona is 30-50% cheaper with more features!
cd local_setup
docker compose up -dContributions welcome! Please open an issue or pull request.
MIT License - See LICENSE for details.
If you find Sunona useful and it saves you time and money building voice AI, please consider giving us a star โญ on GitHub!
Your star helps:
- ๐ Grow the project and community
- ๐ข Reach more developers who need voice AI
- ๐ช Motivate the team to build amazing features
- ๐ฏ Attract contributors and partners
It takes just one click and means the world to us! ๐
Chart auto-updates every 10 minutes! โก
Last Updated: January 10, 2026 at 03:48 UTC
Made with โค๏ธ by the Sunona Team
Building the future of conversational AI