This document contains real issues encountered during development, their root causes, and how to avoid them. Learn from these mistakes so you don't repeat them!
- Tool Execution Issues
- State Management
- Audio & Codec Issues
- Configuration Errors
- Provider-Specific Issues
Symptom: AI says "goodbye" but call doesn't hang up. Tools configured but never executed.
Error Message:
Missing required parameter: 'session.tools[0].name'.
Root Cause: Using Chat Completions schema format (nested) instead of Realtime API format (flat).
The OpenAI Realtime API requires a different schema format than the Chat Completions API:
# ❌ WRONG (Chat Completions format - nested):
{
"type": "function",
"function": { # Nested under "function" key
"name": "hangup_call",
"description": "...",
"parameters": {...}
}
}
# ✅ CORRECT (Realtime API format - flat):
{
"type": "function",
"name": "hangup_call", # Flat structure
"description": "...",
"parameters": {...}
}Solution:
- Use
to_openai_realtime_schema()for OpenAI Realtime provider - Use
to_openai_schema()for Chat Completions (pipelines) - Use
to_deepgram_schema()for Deepgram Voice Agent
Implementation: See src/tools/base.py::to_openai_realtime_schema()
Prevention:
- Always verify schema format for your provider
- Test with
test_schema_format.pyif unsure - Look for "missing_required_parameter" errors in logs
Reference: AAVA-85 regression fix (commit b1c92f1, Nov 19, 2025)
Symptom: Tools configured but Deepgram never calls them.
Root Cause: Using OpenAI field names (tools) instead of Deepgram field names (functions).
# ❌ WRONG (OpenAI naming):
agent.think.tools = [...]
# ✅ CORRECT (Deepgram naming):
agent.think.functions = [...]Also: Event type must match exactly:
# ❌ WRONG:
elif event_type == "function_call"
# ✅ CORRECT:
elif event_type == "FunctionCallRequest"Solution: Use Deepgram-specific naming and event types.
Reference: Deepgram function calling implementation (commit c8d994b, 2163e2f)
Symptom: Email transcript missing initial greeting.
Root Cause: Reinitializing conversation_history as empty list, overwriting session data.
# ❌ WRONG:
conversation_history = []
# ✅ CORRECT:
conversation_history = list(session.conversation_history or [])Impact: Lost greeting message, incomplete email transcripts.
Solution: Always initialize from session state, never overwrite without reading first.
Reference: AAVA-85 bug #1 (commit dd5bc5a)
Symptom: AttributeError: 'Engine' object has no attribute 'app_config'
Root Cause: Engine uses self.config, not self.app_config.
# ❌ WRONG:
context = ToolExecutionContext(
config=self.app_config.dict() # AttributeError!
)
# ✅ CORRECT:
context = ToolExecutionContext(
config=self.config.dict()
)Impact: Tool execution crashes completely.
Solution: Always use self.config in Engine context.
Reference: AAVA-85 bug #2 (commit a007241)
Symptom: Tool executes but call doesn't hang up.
Root Cause: Using delete_channel() instead of hangup_channel().
# ❌ WRONG:
await self.ari_client.delete_channel(channel_id)
# ✅ CORRECT:
await self.ari_client.hangup_channel(channel_id)Solution: Use the correct ARI client method. Check src/ari_client.py for available methods.
Reference: AAVA-85 bug #3 (commit cc125fd)
Symptom: Farewell message partially played before call hangs up.
Root Cause: Fixed sleep duration (2 seconds) doesn't match actual audio length.
# ❌ WRONG:
await asyncio.sleep(2.0) # May be too short or too long
# ✅ CORRECT:
duration_sec = len(audio_bytes) / 8000.0 # mulaw 8kHz
await asyncio.sleep(duration_sec + 0.5) # Add bufferSolution: Calculate sleep duration from actual audio byte count.
Reference: AAVA-85 bug #5 (commit 8058dab)
Symptom: Severe audio garble/distortion on calls.
Root Cause: Code overrode YAML config with detected caller codec, causing format mismatch.
Example:
- Caller uses μ-law (160 bytes/frame)
- Asterisk channel expects PCM16 slin (320 bytes/frame per dialplan)
- Override forced μ-law → frame size mismatch → garble
Solution: AudioSocket format must always match YAML config, never caller codec.
# YAML config
audiosocket:
format: "slin" # This is the source of truthLesson: AudioSocket wire leg is separate from caller-side trunk codec.
Reference: AudioSocket wire format fix (commit 1a049ce, Oct 25, 2025)
Symptom: Pipeline STT fails, no transcripts generated.
Root Cause: Pipeline adapters sent PCM16@16kHz (internal format) but config specified mulaw@8kHz.
Why It Happens:
- Both AudioSocket and ExternalMedia RTP standardize to PCM16@16kHz internally
- Pipeline adapters are "thin wrappers" - send audio as-is without transcoding
- Monolithic providers have internal encoding logic
Solution: Match pipeline config to internal format for zero transcoding:
pipelines:
hybrid_support:
options:
stt:
encoding: linear16 # Matches internal PCM16
sample_rate: 16000 # Matches internal 16kHzBenefit: Best quality, lowest latency, no overhead.
Reference: Pipeline audio codec management (commit 0f71c74, AAVA-28)
Symptom: Pipeline STT receives zero audio. No transcriptions after greeting.
Root Cause: Code checked monolithic provider conditions first, returned early, never reached pipeline routing.
The Bug:
# Checked continuous_input providers FIRST
if provider_name == "deepgram":
provider.send_audio(audio) # Wrong destination!
return # Early return - pipeline code never reachedSolution: Check pipeline mode BEFORE provider-specific routing.
Reference: AudioSocket pipeline routing fix (commit fbbe5b9, Oct 27, 2025)
Symptom: Complete audio failure with error.
Error Message:
Invalid modalities: ['audio'].
Supported combinations are: ['text'] and ['audio', 'text'].
Root Cause: OpenAI Realtime API does NOT support audio-only modality.
Attempted: Force audio generation by using modalities: ['audio']
Reality: API strictly requires ['audio', 'text'] for voice
Known Limitation: OpenAI may occasionally generate text-only or partial audio. This cannot be prevented.
Solution: Always use ['audio', 'text'] and handle gracefully when no audio is generated.
Reference: OpenAI modality constraints (commit 6dbd51e, Nov 10, 2025)
Symptom: Agent self-interrupts, echo loops, audio gate fluttering.
Root Cause: webrtc_aggressiveness: 0 was too sensitive, detected echo as "speech".
Impact:
- Gate opened/closed 50+ times per call
- Echo leaked through gaps
- OpenAI detected own audio
- Self-interruption loop
Solution:
vad:
webrtc_aggressiveness: 1 # CRITICAL for OpenAI RealtimeWhy: OpenAI has sophisticated server-side echo cancellation. Local VAD level 0 fights it. Level 1 ignores echo.
Reference: OpenAI Realtime golden baseline (commit 937b4a4, Oct 26, 2025)
Always verify these log patterns:
✅ SUCCESS:
- "OpenAI session configured with 6 tools"
- "OpenAI function call detected: hangup_call"
- "Hangup tool executed: success"
❌ FAILURE:
- "missing_required_parameter"
- "AI used farewell phrase without invoking hangup_call tool"
- AttributeError in tool execution
- Check schema format first (most common issue)
- Verify tool registration in logs
- Check provider-specific requirements (naming, event types)
- Test with known-good config to isolate issue
- Use
agent rcafor detailed analysis
- Always read before write - Initialize from session state
- Use correct attribute names - Check class definitions
- Verify ARI method names - Check
ari_client.py - Calculate timing dynamically - Don't use fixed sleeps for audio
- Match internal audio formats - Use linear16@16kHz for pipelines
- Test ALL providers when changing shared components
- Verify schema formats for each provider type
- Check logs for warnings - They often indicate misconfigurations
- Tool issues? See Tool Development Guide
- Provider issues? See Provider Development Guide
- Audio issues? See Architecture Deep Dive
- Still stuck? Check Debugging Guide
Remember: Every bug is a learning opportunity. When you fix a bug, add it here to help future contributors! 🚀