๐ค ROBO CODED โ This project was made with AI and may not be 100% sane. But the code does work! ๐
A voice-powered AI assistant that answers phone calls, understands natural language, and performs actions like checking weather, setting timers, scheduling callbacks, and more.
| Feature | Description |
|---|---|
| ๐๏ธ Voice Conversations | Natural speech-to-text and text-to-speech powered by Whisper & Kokoro |
| ๐ค LLM Integration | Connects to OpenAI, vLLM, Ollama, LM Studio, and more |
| ๐ง Built-in Tools | Weather, timers, callbacks, date/time, calculator, jokes |
| ๐ Plugin System | Easily add custom tools with Python |
| ๐ REST API | Initiate outbound calls, execute tools, manage schedules |
| โฐ Scheduled Calls | One-time or recurring calls (daily briefings, reminders) |
| ๐ Webhooks | Trigger calls from Home Assistant, n8n, Grafana, and more |
| ๐ฃ๏ธ Custom Phrases | Customize greetings, goodbyes, and responses via JSON or env vars |
| ๐ Observability | Prometheus metrics, OpenTelemetry tracing, structured JSON logs |
| Use Case | Example |
|---|---|
| โฒ๏ธ Timers & Reminders | "Set a timer for 10 minutes" |
| ๐ Callbacks | "Call me back in an hour" |
| ๐ค๏ธ Weather Briefings | Scheduled morning weather calls |
| ๐ Appointment Reminders | Outbound calls with confirmation |
| ๐จ Alerts & Notifications | Webhook-triggered phone calls |
| ๐ Smart Home | Voice control via phone |
Call the assistant and say:
๐ฃ๏ธ "What's the weather like?"
sequenceDiagram
participant User as ๐ค User
participant Agent as ๐ค SIP Agent
participant STT as ๐ค Speaches
participant LLM as ๐ง LLM
participant Tool as ๐ค๏ธ Weather Tool
User->>Agent: "What's the weather like?"
Agent->>STT: Audio stream
STT-->>Agent: Transcribed text
Agent->>LLM: User query + context
LLM-->>Agent: [TOOL:WEATHER]
Agent->>Tool: Execute
Tool-->>Agent: Weather data
Agent->>LLM: Tool result
LLM-->>Agent: Natural response
Agent->>STT: Text to speech
STT-->>Agent: Audio
Agent->>User: "At Storm Lake, it's 44ยฐ..."
Assistant responds:
๐ค "At Storm Lake, as of 9:30 pm, it's 44 degrees with foggy conditions. Wind is calm."
flowchart LR
subgraph Caller
Phone[๐ฑ SIP Phone]
end
subgraph Agent["๐ค SIP AI Agent"]
SIP[SIP Client]
Audio[Audio Pipeline]
Tools[Tool Manager]
API[REST API]
end
subgraph Services
LLM[๐ง LLM Server<br/>OpenAI / vLLM / Ollama]
Speaches[๐ค Speaches<br/>STT + TTS]
end
subgraph Integrations
HA[๐ Home Assistant]
N8N[๐ n8n]
Webhook[๐ Webhooks]
end
Phone <-->|SIP/RTP| SIP
SIP <--> Audio
Audio <-->|Whisper| Speaches
Audio <-->|Kokoro| Speaches
Audio <--> Tools
Tools <-->|OpenAI API| LLM
API <--> Tools
HA -->|HTTP| API
N8N -->|HTTP| API
Webhook -->|HTTP| API
| Service | Purpose | URL |
|---|---|---|
| ๐ค SIP Agent | AI Voice Assistant API | localhost:8080 |
| ๐ค Speaches | STT/TTS (Whisper + Kokoro) | localhost:8001 |
| ๐ง vLLM | LLM Inference | localhost:8000 |
| ๐ด Redis | Call Queue & Cache | redis://localhost:6379 |
| ๐ Prometheus | Metrics Collection | localhost:9090 |
| ๐ Grafana | Dashboards | localhost:3000 |
| ๐ Loki | Log Aggregation | localhost:3100 |
| ๐ Tempo | Distributed Tracing | localhost:3200 |
| ๐ n8n | Workflow Automation | localhost:5678 |
| Requirement | Description |
|---|---|
| ๐ณ Docker | Docker and Docker Compose |
| ๐ SIP Server | FreePBX, Asterisk, 3CX, or any SIP PBX |
# Clone the repository
git clone https://github.com/your-org/sip-agent.git
cd sip-agent
# Configure environment
cp sip-agent/.env.example sip-agent/.env
nano sip-agent/.env
# Start services
docker compose up -d
# (Optional) Start services with Observability
docker compose -f ./docker-compose.yml -f docker-compose.observability.yml up -dcurl http://localhost:8080/health | jqExpected output:
{
"status": "healthy",
"sip_registered": true,
"active_calls": 0
}โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ INCOMING CALL โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ค "Hello! Welcome to the AI assistant. How can I help?" โ
โ ๐ค "What's the weather like?" โ
โ ๐ค "At Storm Lake, it's 44 degrees with foggy conditions."โ
โ ๐ค "Set a timer for 5 minutes" โ
โ ๐ค "Timer set for 5 minutes!" โ
โ ๐ค "Goodbye" โ
โ ๐ค "Goodbye! Have a great day!" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Create a .env file with your settings:
# ๐ SIP Connection
SIP_USER=ai-assistant
SIP_PASSWORD=your-secure-password
SIP_DOMAIN=pbx.example.com
# ๐ค Speaches (STT + TTS)
SPEACHES_API_URL=http://speaches:8001
# ๐ง LLM Settings
LLM_BASE_URL=http://vllm:8000/v1
LLM_MODEL=openai-community/gpt2-xl
# ๐ค๏ธ Weather (Optional)
TEMPEST_STATION_ID=12345
TEMPEST_API_TOKEN=your-api-token๐ See Configuration Reference for all options.
curl -X POST http://localhost:8080/call \
-H "Content-Type: application/json" \
-d '{
"extension": "5551234567",
"message": "Hello! This is a reminder about your appointment tomorrow."
}'Response:
{
"call_id": "out-1732945860-1",
"status": "queued",
"message": "Call initiated"
}Schedule a daily weather call at 7am:
curl -X POST http://sip-agent:8080/schedule \
-H "Content-Type: application/json" \
-d '{
"extension": "5551234567",
"tool": "WEATHER",
"at_time": "07:00",
"timezone": "America/Los_Angeles",
"recurring": "daily",
"prefix": "Good morning! Here is your weather update for today.",
"suffix": "Have a great day!"
}' | jqResponse:
{
"schedule_id": "a1b2c3d4",
"status": "scheduled",
"scheduled_for": "2025-12-01T07:00:00-08:00",
"recurring": "daily"
}curl http://localhost:8080/tools | jq '.[].name'Output:
"WEATHER"
"SET_TIMER"
"CALLBACK"
"HANGUP"
"STATUS"
"CANCEL"
"DATETIME"
"CALC"
"JOKE"
Data center GPUs with maximum performance.
| Component | Model | Notes |
|---|---|---|
| LLM | meta-llama/Llama-3.1-70B-Instruct |
Best quality, fits in single GPU |
| LLM | Qwen/Qwen2.5-72B-Instruct |
Alternative, excellent reasoning |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# H100/A100 80GB Configuration
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartGrace Blackwell GB10 with shared CPU/GPU memory.
| Component | Model | Notes |
|---|---|---|
| LLM | meta-llama/Llama-3.1-70B-Instruct |
Fits in unified memory |
| LLM | Qwen/Qwen2.5-72B-Instruct |
Alternative option |
| LLM | deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
Reasoning focused |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# DGX Spark Configuration (128GB unified memory)
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartNext-gen consumer flagship.
| Component | Model | Notes |
|---|---|---|
| LLM | Qwen/Qwen2.5-32B-Instruct |
Best fit for 32GB |
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Faster, lower quality |
| LLM | mistralai/Mistral-Small-24B-Instruct-2501 |
Good balance |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# RTX 5090 Configuration (32GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-32B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartCurrent consumer flagship.
| Component | Model | Notes |
|---|---|---|
| LLM | Qwen/Qwen2.5-14B-Instruct |
Best quality for 24GB |
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Faster option |
| LLM | mistralai/Mistral-7B-Instruct-v0.3 |
Good tool calling |
| STT | Systran/faster-whisper-large-v3 |
Best accuracy |
| TTS | af_heart |
Warm, natural voice |
# RTX 4090 Configuration (24GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-14B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heartHigh-end consumer GPUs.
| Component | Model | Notes |
|---|---|---|
| LLM | meta-llama/Llama-3.1-8B-Instruct |
Best for 16-24GB |
| LLM | Qwen/Qwen2.5-7B-Instruct |
Fast alternative |
| LLM | microsoft/Phi-3-medium-4k-instruct |
14B, good quality |
| STT | Systran/faster-whisper-medium |
Good balance |
| TTS | af_heart |
Warm, natural voice |
# RTX 3090/4080 Configuration (16-24GB VRAM)
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-medium
TTS_VOICE=af_heartMid-range GPUs.
| Component | Model | Notes |
|---|---|---|
| LLM | Qwen/Qwen2.5-7B-Instruct |
Best for 10-12GB |
| LLM | microsoft/Phi-3-mini-4k-instruct |
3.8B, very fast |
| LLM | meta-llama/Llama-3.2-3B-Instruct |
Lightweight |
| STT | Systran/faster-whisper-small |
Low VRAM |
| TTS | af_heart |
Warm, natural voice |
# RTX 3080/4070 Configuration (10-12GB VRAM)
LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
LLM_URL=http://localhost:8000/v1
STT_MODEL=Systran/faster-whisper-small
TTS_VOICE=af_heartOptimized for fastest response times.
# Minimum latency configuration
LLM_MODEL=Qwen/Qwen2.5-3B-Instruct
STT_MODEL=Systran/faster-whisper-tiny.en
TTS_VOICE=af_heart
TTS_SPEED=1.1| Voice | Style | Gender | Accent |
|---|---|---|---|
af_heart |
Warm, friendly | Female | American |
af_bella |
Professional | Female | American |
af_sarah |
Casual | Female | American |
af_nicole |
Expressive | Female | American |
am_adam |
Neutral | Male | American |
am_michael |
Professional | Male | American |
bf_emma |
Warm | Female | British |
bm_george |
Professional | Male | British |
| Tool | Description | Example Phrase |
|---|---|---|
๐ค๏ธ WEATHER |
Current weather conditions | "What's the weather?" |
โฒ๏ธ SET_TIMER |
Set a countdown timer | "Set a timer for 5 minutes" |
๐ CALLBACK |
Schedule a callback | "Call me back in an hour" |
๐ด HANGUP |
End the call | "Goodbye" |
๐ STATUS |
Check pending timers | "What timers do I have?" |
โ CANCEL |
Cancel timers/callbacks | "Cancel my timer" |
๐ DATETIME |
Current date and time | "What time is it?" |
๐งฎ CALC |
Math calculations | "What's 25 times 4?" |
๐ JOKE |
Tell a joke | "Tell me a joke" |
๐ฆ SIMON_SAYS |
Repeat back verbatim | "Simon says hello world" |
Add custom tools by creating Python plugins:
# src/plugins/hello_tool.py
from tool_plugins import BaseTool, ToolResult, ToolStatus
class HelloTool(BaseTool):
name = "HELLO"
description = "Say hello to someone"
parameters = {
"name": {
"type": "string",
"description": "Name to greet",
"required": True
}
}
async def execute(self, params):
name = params.get("name", "friend")
return ToolResult(
status=ToolStatus.SUCCESS,
message=f"Hello, {name}! Nice to meet you."
)Register in tool_manager.py:
from plugins.hello_tool import HelloTool
tool_classes = [
# ... existing tools ...
HelloTool,
]๐ See Creating Plugins for the full guide.
# Docker logs
docker logs -f sip-agent
# Formatted log viewer
python tools/view-logs.py -fExample output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ CALL #1 - From: 1001
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
15:30:05 ๐ Call started
15:30:06 ๐ค "What's the weather?"
15:30:07 ๐ง [TOOL:WEATHER]
15:30:08 ๐ค "At Storm Lake, it's 44 degrees..."
15:30:12 ๐ค "Thanks, goodbye"
15:30:13 ๐ด Call ended (duration: 0:08)
Import the included dashboard:
grafana/dashboards/sip-agent.jsonsip-agent/
โโโ ๐ README.md # ๐ You are here
โโโ ๐ RELEASE.md # Release notes
โโโ ๐ CHANGELOG.md # Version history
โโโ ๐ docker-compose.yml # Main compose file
โโโ ๐ docker-compose.observability.yml
โโโ ๐ openapi.yaml # API specification
โ
โโโ ๐ sip-agent/ # Core application
โ โโโ ๐ Dockerfile
โ โโโ ๐ requirements.txt
โ โโโ ๐ .env.example
โ โโโ ๐ data/
โ โ โโโ ๐ phrases.json.example
โ โโโ ๐ src/
โ โโโ ๐ main.py # Application entry
โ โโโ ๐ config.py # Configuration
โ โโโ ๐ api.py # REST API
โ โโโ ๐ sip_handler.py # SIP call handling
โ โโโ ๐ audio_pipeline.py # STT/TTS processing
โ โโโ ๐ llm_engine.py # LLM integration
โ โโโ ๐ tool_manager.py # Tool orchestration
โ โโโ ๐ tool_plugins.py # Plugin base classes
โ โโโ ๐ call_queue.py # Redis call queue
โ โโโ ๐ realtime_client.py # WebSocket STT
โ โโโ ๐ telemetry.py # OpenTelemetry
โ โโโ ๐ logging_utils.py # Structured logging
โ โโโ ๐ retry_utils.py # API retry logic
โ โโโ ๐ plugins/ # Built-in tools
โ โโโ ๐ weather_tool.py
โ โโโ ๐ timer_tool.py
โ โโโ ๐ callback_tool.py
โ โโโ ๐ hangup_tool.py
โ โโโ ๐ status_tool.py
โ โโโ ๐ cancel_tool.py
โ โโโ ๐ datetime_tool.py
โ โโโ ๐ calc_tool.py
โ โโโ ๐ joke_tool.py
โ โโโ ๐ simon_says_tool.py
โ
โโโ ๐ docs/ # Documentation
โ โโโ ๐ index.md # Overview
โ โโโ ๐ getting-started.md # Installation
โ โโโ ๐ configuration.md # Config reference
โ โโโ ๐ api-reference.md # REST API
โ โโโ ๐ tools.md # Built-in tools
โ โโโ ๐ plugins.md # Plugin development
โ โโโ ๐ examples.md # Integration examples
โ โโโ ๐ screenshots/
โ
โโโ ๐ observability/ # Monitoring stack
โ โโโ ๐ grafana/
โ โ โโโ ๐ provisioning/
โ โ โโโ ๐ dashboards/ # Pre-built dashboards
โ โ โโโ ๐ datasources/
โ โโโ ๐ prometheus/
โ โ โโโ ๐ prometheus.yaml
โ โโโ ๐ loki/
โ โ โโโ ๐ loki.yaml
โ โโโ ๐ tempo/
โ โ โโโ ๐ tempo.yaml
โ โโโ ๐ otel-collector/
โ โโโ ๐ config.yaml
โ
โโโ ๐ tools/ # Utilities
โ โโโ ๐ view-logs.py # Log viewer
โ
โโโ ๐ .github/
โโโ ๐ workflows/
โโโ ๐ docker-build.yml # Docker CI
โโโ ๐ readme-sync.yml # Docs sync
This project is optimized to run on the NVIDIA DGX Spark with Grace Blackwell architecture.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ข NVIDIA DGX Spark โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ง Grace Blackwell GB10 Superchip โ
โ ๐พ 128GB Unified Memory โ
โ โก 1 PFLOP AI Performance โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
Local LLM inference (vLLM, Ollama) โ
โ โ
Local STT/TTS (Speaches + Whisper + Kokoro) โ
โ โ
Real-time voice processing โ
โ โ
Multiple concurrent calls โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Recommended DGX Spark setup:
# Run everything locally on DGX Spark
LLM_BASE_URL=http://localhost:8000/v1
LLM_MODEL=openai/gpt-oss-20b
SPEACHES_API_URL=http://localhost:8001๐ Full documentation available at sip-agent.readme.io
| Document | Description |
|---|---|
| ๐ Overview | Architecture and features |
| ๐ Getting Started | Installation guide |
| โ๏ธ Configuration | Environment variables |
| ๐ API Reference | REST API endpoints |
| ๐ง Built-in Tools | Available tools |
| ๐ Creating Plugins | Custom tool development |
| ๐ Examples | Integration patterns |
Contributions are welcome! Please read our contributing guidelines first.
# Fork and clone
git clone https://github.com/your-username/sip-agent.git
# Create branch
git checkout -b feature/amazing-feature
# Make changes and test
docker compose up -d
python -m pytest
# Commit with emoji
git commit -m "โจ feat: add amazing feature"
# Push and PR
git push origin feature/amazing-featureThis project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.
SPDX-License-Identifier: AGPL-3.0-or-later
- NVIDIA DGX Spark โ AI supercomputer platform
- Speaches โ Unified STT/TTS server
- PJSIP โ SIP stack
- FastAPI โ REST API framework
- WeatherFlow Tempest โ Weather data
| Resource | Link |
|---|---|
| ๐ Docs | sip-agent.readme.io |
| ๐ Issues | GitHub Issues |
| ๐ฌ Discussions | GitHub Discussions |
Made with โค๏ธ and ๐ค






