feat: WhatsApp interface V2 — media I/O, interactive tools, Team/Workflow support#6466
feat: WhatsApp interface V2 — media I/O, interactive tools, Team/Workflow support#6466Mustafa-Esoofally wants to merge 39 commits intomainfrom
Conversation
…and tests - Rewrite WhatsAppTools with 9 sync-only tools and enable/disable flags - Add send_reply_buttons, send_list_message, send_image, send_document, send_location, send_reaction, and mark_as_read tools - Harden security: remove dev mode bypass, add replay protection (5-min window) - Handle interactive replies, location, reaction, contacts, sticker messages - Replace requests with httpx in utils - Add Pydantic response models and operation IDs to router - Add 48 unit tests covering tools, security, and router
|
@claude what are the sync only tools added and create some code example to test an agent using those tools in prod |
Code reviewFound 1 issue:
The HMAC-SHA256 signature validation algorithm itself is correct. The issue is the replay protection at lines 30-33 uses the message body timestamp (when the user sent the message) instead of an HTTP request timestamp. This causes:
agno/libs/agno/agno/os/interfaces/whatsapp/security.py Lines 29 to 34 in 496b8f6 |
|
@claude what are the new tools needed to be tested for whatsapp and create an agent that uses them all |
Co-authored-by: VirusDumb <VirusDumb@users.noreply.github.com>
Enable showing agent reasoning in the WhatsApp interface and switch to a newer Claude model. Updated cookbook agent to use "claude-sonnet-4-6" and enabled debug_mode; pass show_reasoning=True to the Whatsapp interface. In the library, default show_reasoning is now True for attach_routes and the Whatsapp class (added constructor param and stored property), and the router message formatting for reasoning was slightly adjusted. This makes the WhatsApp interface surface agent reasoning by default and aligns the example with the updated model.
…o-agi/agno into update/whatsapp-interface-v2
Add interactive message handling and typed WhatsApp tools, plus cookbook examples and tests.\n\n- Introduces handling for interactive messages (button_reply, list_reply) in the WhatsApp router and exposes send_user_number_to_context to include the sender number in agent/team/workflow context.\n- Adds ReplyButton, ListRow and ListSection Pydantic models to WhatsAppTools and validates limits (max buttons, sections, rows). Improves HTTP error handling when posting to the WhatsApp API.\n- Updates Whatsapp interface to accept send_user_number_to_context and propagate it to the router.\n- Adds cookbook examples: content_workflow, support_team, tourist_guide.\n- Updates whatsapp_all_tools example content (image/document URLs) and removes a few debug_mode flags from example agents.\n- Adds/updates unit tests to cover interactive message processing and the typed tools API.
Add support for audio and document messages and include WhatsApp agent examples. New cookbook examples: audio_tools (ElevenLabs TTS voice replies) and research_agent (web research + PDF generation). Update image_generation_model to use gemini-3-pro-image-preview and enable debug_mode in tourist_guide. Enhance WhatsApp router to extract/upload media, handle images, files, audio (including PCM->WAV wrapping for Gemini TTS), and send document/audio messages. Add send_document_message and send_audio_message (sync + async) utilities in agno.utils.whatsapp and helper functions for media extraction and MIME handling.
…o-agi/agno into update/whatsapp-interface-v2
| ) | ||
| from agno.workflow import RemoteWorkflow, Workflow | ||
|
|
||
| from .security import validate_webhook_signature |
There was a problem hiding this comment.
full import
- Extract 8 pure utility functions from router.py into helpers.py, matching the Slack interface's separation pattern - Add send_text_message_async/send_text_message to utils/whatsapp.py so the router uses proper async sends instead of sync httpx.post - Unify entity dispatch with single entity.arun() call - Fix messages[0] bug: now iterates all messages in webhook payload - Pass add_dependencies_to_context as per-call kwarg instead of mutating shared agent state - Update test mock paths for helpers extraction
…2E testing - Remove reload=True from all WhatsApp cookbooks (causes child process issues) - Fix gpt-5.2 -> gpt-4o in image_generation_tools and multiple_instances - Fix multiple_instances app reference - Add prepare_audio_for_whatsapp helper for PCM->WAV conversion - Fix test_send_text_message_no_recipient env isolation - Minor router cleanup from E2E testing
WhatsApp auto-dismisses the typing indicator after ~25 seconds. For agents that take longer to respond, the indicator would disappear and users would see no feedback until the response arrived. Spawns a background asyncio task that refreshes the typing indicator every 20 seconds while entity.arun() is executing, cancelled via try/finally when the run completes.
- deep_research: multi-tool agent with 7 toolkits, PDF output, reply buttons for format choice, reactions, mark-as-read - multimodal_team: coordinated Vision Analyst + Creative Agent team for image analysis and generation - interactive_concierge: showcases ALL WhatsApp interactive features (buttons, lists, location pins, reactions, images) in a concierge flow - multimodal_workflow: parallel workflow with visual analysis + web research followed by creative synthesis with DALL-E + PDF
- Add geocode-mcp (OpenStreetMap Nominatim) for accurate venue coordinates - Update instructions to use mcp_geocoding_get_coordinates before send_location - Add instruction to keep text responses brief when interactive elements are sent - Pin uvx to Python 3.12 for geocode-mcp compatibility
- Extract parse_whatsapp_message() to helpers with ParsedMessage dataclass - Move extract_earliest_timestamp to security.py (co-located with validation) - Replace _MIME_MAP dict with stdlib mimetypes.guess_type() - Add operation_id to /status endpoint (prevents FastAPI collision) - Accept Union[str, float] for lat/lng in send_location - Skip signature validation when WHATSAPP_APP_SECRET is unset (dev convenience) - Truncate button/list titles to WhatsApp limits (20/24 chars) - Default show_reasoning to False - Add test coverage for video/audio/document parsing, edge cases
Adds inline comments across all WhatsApp files matching the Slack interface commenting style. Also refactors run_kwargs from inline ternaries to incremental dict building for clarity.
…on truncation - Move Graph API functions from utils/whatsapp.py into os/interfaces/whatsapp/helpers.py and delete the old module - Introduce WhatsAppConfig dataclass to resolve credentials once at startup instead of reading env vars on every API call - Hash phone numbers (SHA-256) before they reach storage so databases never contain plaintext PII - Truncate media captions at 1024 chars and send full text as a separate message when content exceeds the limit - Consolidate four upload functions into one upload_and_send_media_async - Accept per-instance credentials in attach_routes for multi-bot deployments - Update multiple_instances cookbook with detailed setup docs
Hash phone numbers with SHA-256 for PII protection — raw numbers never reach storage as user_id or session_id. Add /new command that starts a fresh session without deleting old data, using in-memory session mapping with UUID suffixes. On server restart the map is empty so users resume their original deterministic session. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…exception - Move _WA_TOOL_NAMES to module-level frozenset (was recreated per message) - Remove unused WhatsAppVerifyResponse model - Remove redundant docstring on webhook endpoint (description= kwarg suffices) - Fix except (UnicodeDecodeError, Exception) → Exception (superset) - Clean up str(e) in f-strings and inline asyncio imports in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename content_workflow.py -> basic_workflow.py - Rename deep_research.py -> streaming_deep_research.py - Rename research_agent.py -> research_assistant.py - Replace port=8000 with reload=True across all cookbooks - Remove stale __main__ docstrings
Remove json, log_debug, Mock — unused after helpers consolidation.
Mustafa-Esoofally
left a comment
There was a problem hiding this comment.
Self-review: annotating non-obvious design decisions for reviewers. All 115 unit tests pass, 16 cookbooks E2E tested.
| "send_reaction", | ||
| "mark_as_read", | ||
| } | ||
| ) |
There was a problem hiding this comment.
Why _WA_TOOL_NAMES exists: When WhatsAppTools (e.g. send_reply_buttons) runs during agent execution, it sends a message directly to the user via the Graph API. Without this check, the router would also send response.content as a plain text message — duplicating the output.
Verified via experiment: with tools=[WhatsAppTools(enable_send_reply_buttons=True)], agent calls send_reply_buttons during execution. The response.tools list contains the tool execution, and this check prevents the router from also calling send_text_message_async.
frozenset because it's a module-level constant used in a hot path (any(t.tool_name in ...) is O(1) per lookup).
|
|
||
| # Maps hashed_phone → session_id; absent key falls back to default deterministic ID. | ||
| # On server restart the map is empty, so users resume their original session. | ||
| _active_sessions: Dict[str, str] = {} |
There was a problem hiding this comment.
In-memory session map — intentional. Maps hashed_phone → session_id for /new command support. On server restart, the map is empty so users resume their default deterministic session (wa:{hash}).
No lock needed: Python's GIL protects dict reads/writes, and asyncio runs on a single thread. The Slack interface doesn't need this because Slack provides its own thread-based session management.
| phone_number = message["from"] | ||
| # Hash phone number before it reaches storage — deterministic so the | ||
| # same phone always resolves to the same session, but irreversible | ||
| hashed_phone = hashlib.sha256(phone_number.encode()).hexdigest() |
There was a problem hiding this comment.
Privacy: phone number hashing. The raw phone number is only used for sending WhatsApp API responses (line 180, 254, etc.). For all storage-facing operations (user_id, session_id, log messages), only the SHA-256 hash is used. This means:
- Same phone → same hash → same session (deterministic)
- Storage never sees the raw number (irreversible)
- Log messages show truncated hash:
hashed_phone[:12]
|
|
||
| log_info(f"Processing message from {hashed_phone[:12]}: {parsed.text}") | ||
|
|
||
| session_id = _active_sessions.get(hashed_phone, f"wa:{hashed_phone}") |
There was a problem hiding this comment.
Session ID fallback design: When _active_sessions has no entry (fresh server or user never ran /new), the session ID is deterministic: wa:{hashed_phone}. This means:
- Server restarts don't orphan sessions — the user gets the same session back
/newcreateswa:{hash}:{random_8hex}— a new session without deleting old data- Multiple
/newcalls just overwrite the map entry
| run_kwargs["add_dependencies_to_context"] = True | ||
|
|
||
| # Refresh typing indicator every 20s while the agent runs | ||
| # WhatsApp auto-dismisses the indicator after ~25s |
There was a problem hiding this comment.
Why 20s interval: WhatsApp's typing indicator auto-dismisses after ~25 seconds. Agent execution (especially with tools like web search, file generation) can take 30-60s+. This task refreshes the indicator before it expires, giving users visual feedback that the bot is still working. The CancelledError catch ensures clean shutdown when the agent finishes.
| return content | ||
| elif isinstance(content, str): | ||
| return base64.b64decode(content) | ||
| return None |
There was a problem hiding this comment.
Why two code paths for bytes content:
- Some tools return media as raw binary bytes (e.g.
httpx.get().content) → goes toexceptbranch, returned as-is - Other tools return base64-encoded data as bytes (e.g.
b'iVBORw0KGgo...') →decode('utf-8')succeeds, thenb64decodeproduces the actual binary - String content (third branch) is straightforward base64 decode
The try/except on decode('utf-8') distinguishes case 1 from case 2 — if the bytes aren't valid UTF-8, they're already raw binary.
| fmt = getattr(audio_obj, "format", None) or mime_type.split("/")[-1] | ||
| return audio_bytes, mime_type, f"audio.{fmt}" | ||
|
|
||
| # Raw PCM from Gemini (e.g. "audio/L16;rate=24000") — wrap as WAV |
There was a problem hiding this comment.
Gemini TTS PCM→WAV conversion: Google's Gemini TTS returns raw PCM audio with mime_type audio/L16;rate=24000 — just raw 16-bit samples with no container. WhatsApp rejects this because it requires a proper audio container format (mp3, ogg, wav, aac, amr). This wraps the PCM data in a WAV header using Python's wave module.
The sample rate is parsed from the mime_type string (rate=24000) with a fallback to 24000 (Gemini's default). Channels default to 1 (mono).
| return "\n".join([f"_{line}_" for line in text.split("\n")]) | ||
| return text | ||
|
|
||
| # WhatsApp limit is 4096 chars; split at 4000 to leave room for batch prefix |
There was a problem hiding this comment.
Why 4000, not 4096: WhatsApp's hard limit is 4096 characters per message. We split at 4000 to leave 96 chars of headroom for the [1/N] batch prefix (e.g. [1/12] = 7 chars). This avoids edge cases where the prefix + content would exceed the limit.
| if not app_secret: | ||
| # Explicit opt-out: skip validation when secret is not configured | ||
| log_warning("WHATSAPP_APP_SECRET not set — signature validation disabled (DO NOT use in production)") | ||
| return True |
There was a problem hiding this comment.
Intentional opt-out for development. When WHATSAPP_APP_SECRET is not set, signature validation is skipped entirely. This lets developers test with ngrok/tunnels without configuring the app secret. The log_warning ensures this doesn't silently pass in production.
Previously this module had if APP_ENV == 'development': return True which was removed — checking for the secret's absence is more explicit and doesn't rely on environment naming conventions.
|
|
||
| # 5-minute window guards against replay attacks | ||
| if timestamp is not None: | ||
| if abs(time.time() - timestamp) > 300: |
There was a problem hiding this comment.
Replay protection: 5-minute window. Rejects any webhook with a message timestamp more than 5 minutes from the current server time. This guards against replay attacks where an attacker captures and re-sends a valid signed webhook payload. The abs() also catches future timestamps (clock skew). Timestamp is extracted by extract_earliest_timestamp() which walks the nested webhook payload to find the oldest message.
helpers.py (-77 lines): - Delete dead sync functions (get_media, upload_media) - Replace extract_media_bytes with get_content_bytes() - Privatize low-level senders (_send_text, _send_media) - Remove banner comments, stale docstrings, redundant f-strings - Use typed Audio/File field access instead of getattr - Widen get_media_async/upload_media_async to catch HTTPError router.py: - Validate get_media_async responses before constructing media objects - Add send_text_message/send_template_message to _WA_TOOL_NAMES Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete 5 cookbooks that overlap with existing examples or demonstrate features better covered by tourist_guide and support_team. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… utility, delete dead code - Rename WhatsAppConfig.from_env → init, ParsedMessage → MessageContent, parse_whatsapp_message → extract_message_content - Make internal senders private: _send_text, _send_media - Extract pcm_to_wav_bytes to agno/utils/audio.py for reuse across interfaces - Delete prepare_audio_for_whatsapp — inline audio logic in upload_and_send_media_async - Delete sync get_media/upload_media (router is async-only) - Use typed media objects (Image, Audio, File) instead of getattr - Update tests: proper media objects, updated patch targets, remove dead tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Audio import - Router tests patched old send_text_message_async → now patch _send_text - Remove unused Audio import from helpers.py (ruff F401) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Complete overhaul of the WhatsApp interface — adds media I/O, interactive UI tools, Team/Workflow entity support, security hardening, and 16 cookbooks aligned with Slack's naming conventions.
28 files changed: 6 core modules, 17 cookbooks, 4 test suites, 1 deleted (
utils/whatsapp.py→ consolidated intohelpers.py).Architecture
Key design pattern: The router handles inbound messages and sends agent responses.
WhatsAppToolslets agents send during execution (reply buttons, list messages, reactions)._WA_TOOL_NAMESinrouter.pyprevents duplicate text when a tool already messaged the user — verified via experiment (see line comment).Core changes
Router rewrite (
router.py):entity.arun()/newcommand starts a fresh session without losing old datashow_reasoningsupport — extended thinking sent as italicized prefixExtracted helpers (
helpers.py— NEW file):WhatsAppConfigdataclass — credentials resolved once at startup, passed via closureupload_and_send_media_async— unified upload+send for image/document/audio with text fallback on failureprepare_audio_for_whatsapp— converts raw PCM (e.g. Gemini TTSaudio/L16;rate=24000) to WAV containersend_whatsapp_message_async— batches messages >4096 chars into 4000-char chunks with[1/N]prefixextract_media_bytes— handles both raw bytes and base64-encoded content from tool resultsWhatsAppTools rewrite (
tools/whatsapp.py):enable_*flags andall=Trueoverridesend_text,send_template,send_reply_buttons,send_list_message,send_image,send_document,send_location,send_reaction,mark_as_readSecurity hardening (
security.py):if APP_ENV == "development": return True)WHATSAPP_APP_SECRETDeleted
utils/whatsapp.py— all functionality consolidated intohelpers.pywith proper async support. Replacedrequestswithhttpx.Cookbooks (16 total, all E2E tested)
Naming aligned with Slack interface cookbooks for cross-platform consistency.
basic.pybasic_workflow.pystreaming_deep_research.pyresearch_assistant.pyagent_with_media.pyimage_generation_model.pyimage_generation_tools.pyaudio_tools.pyagent_with_user_memory.pyreasoning_agent.pyinteractive_concierge.pytourist_guide.pysupport_team.pymultimodal_team.pymultimodal_workflow.pymultiple_instances.pyPlus
cookbook/91_tools/whatsapp_all_tools.py— standalone sync demo of all 9 tools.Tests (115 passing)
test_whatsapp_router.py/newcommand, error handling, media responsestest_whatsapp_helpers.pytest_whatsapp_security.pytest_whatsapp_tools.pyType of change
Checklist
./scripts/format.shand./scripts/validate.sh)Additional Notes
reload=True, operation IDsrequestsfully replaced withhttpx(sync + async)content_workflow→basic_workflow,deep_research→streaming_deep_research,research_agent→research_assistantreload=Trueinstead of hardcodedport=8000(works with newAGENT_OS_PORTenv var from feat: add AGENT_OS_HOST/AGENT_OS_PORT env var fallbacks to serve() #6857)