Version 0.0.1 | Released October 15, 2025
The Voice Manifest (voice-manifest.json) makes websites voice-enabled in the same way that the Web App Manifest (manifest.json) makes websites installable as Progressive Web Apps.
Just as manifest.json tells browsers and operating systems "this website can act like a native app," the Voice Manifest tells voice agents, browsers, and operating systems "this website can be interacted with through voice."
Voice AI is everywhere—in our phones, computers, cars, and smart speakers. Yet websites remain primarily visual interfaces that voice assistants struggle to interact with meaningfully.
When a user says "book a table at that Italian restaurant," their voice assistant might find the restaurant's website, but has no standardized way to:
- Understand what voice interactions are possible
- Know how to execute actions on the user's behalf
- Provide a consistent voice experience
The Voice Manifest provides a declarative way for websites to describe their voice capabilities. It's a simple JSON file that any compatible voice client can read to enable voice interactions.
<link rel="voice-manifest" href="/voice-manifest.json" />The simplest voice-enabled website needs only a name and optionally some display information:
{
"name": "Pasta Paradise",
"display": {
"call_to_action": "Ask about our menu or make a reservation",
"suggested_prompts": [
"What pasta dishes do you have?",
"Make a reservation for Friday"
]
}
}That's it! Any voice client (browser extension, OS feature, voice agent platform) can now:
- Detect the site is voice-enabled
- Show the activation phrase to users
- Provide suggested prompts
- Use its own STT/LLM/TTS providers to enable voice interaction
The Voice Manifest is about what your site can do, not how to configure voice providers.
This is NOT a configuration file for your voice pipeline. It's a public declaration of your site's voice capabilities, similar to how manifest.json declares PWA capabilities.
Start simple, add complexity as needed:
- Minimal: Just metadata and display hints
- + Functions: Add function calling for actions
- + System Prompt: Customize the voice assistant's behavior
- + MCP: Connect to backend services
- + Agent Config: Specify preferred voice providers (optional)
The manifest supports multiple approaches:
No providers specified (Browser/OS provides fallback):
{
"name": "My Site",
"functions": [...]
}With specific voice agent (All-in-one solution):
{
"agent": {
"provider": {
"name": "retell",
"endpoint": "https://api.retellai.com/v1",
"agent_id": "agent_abc123"
}
}
}With composite STT/LLM/TTS (Individual components):
{
"agent": {
"provider": {
"stt": { "name": "deepgram" },
"llm": { "name": "openai", "model": "gpt-4" },
"tts": { "name": "elevenlabs" }
}
}
}Control how your voice interface appears to users:
{
"name": "Pasta Paradise",
"short_name": "PP",
"display": {
"icon": "/icons/voice-icon.png",
"background_color": "#8B0000",
"theme_color": "#8B0000",
"activation_phrase": "Talk to Pasta Paradise",
"call_to_action": "Ask about our menu or make a reservation",
"suggested_prompts": [
"What pasta dishes do you have?",
"Make a reservation for Friday at 7 PM",
"Do you have gluten-free options?"
]
}
}These fields help voice clients present your site's capabilities in a user-friendly way.
Define how your voice assistant should behave:
{
"system_prompt": "You are a helpful assistant for Pasta Paradise restaurant. Help customers with menu questions, reservations, and general information. Be warm, friendly, and knowledgeable about Italian cuisine."
}Or reference an external file:
{
"system_prompt": {
"$ref": "./prompts/system-prompt.txt"
}
}Define actions using OpenAI's function calling standard:
{
"functions": [
{
"name": "make_reservation",
"description": "Create a dining reservation",
"parameters": {
"type": "object",
"properties": {
"date": {
"type": "string",
"format": "date",
"description": "Reservation date (YYYY-MM-DD)"
},
"time": {
"type": "string",
"format": "time",
"description": "Reservation time (HH:MM)"
},
"party_size": {
"type": "integer",
"minimum": 1,
"maximum": 20,
"description": "Number of guests"
},
"name": {
"type": "string",
"description": "Name for the reservation"
},
"phone": {
"type": "string",
"description": "Contact phone number"
}
},
"required": ["date", "time", "party_size", "name", "phone"]
}
}
]
}Connect to Model Context Protocol servers for tool discovery:
{
"mcp": {
"servers": {
"restaurant": {
"url": "https://api.restaurant.com/mcp"
}
}
}
}Voice clients connect to this URL to discover available tools, resources, and prompts via the MCP protocol. Your MCP server must already be running and accessible at this endpoint.
Specify preferred voice providers if you have specific requirements:
All-in-one voice agent:
{
"agent": {
"provider": {
"name": "retell",
"endpoint": "https://api.retellai.com/v1",
"agent_id": "agent_abc123",
"config": {
"voice_id": "professional-female-us",
"voice_speed": 1.0,
"interruption_sensitivity": 0.5
}
}
}
}Composite STT/LLM/TTS:
{
"agent": {
"provider": {
"stt": {
"name": "deepgram",
"model": "nova-2",
"keywords": ["pasta", "reservation", "gluten-free"]
},
"llm": {
"name": "openai",
"model": "gpt-4",
"temperature": 0.7
},
"tts": {
"name": "elevenlabs",
"voice_id": "clara-italian-warmth"
}
}
}
}Important: You can specify providers, but voice clients can use their own fallbacks if:
- Providers aren't specified
- Specified providers aren't available
- Users prefer different providers
{
"$schema": "https://voicemanifest.org/voice-manifest/schema/0.0.1/voice-manifest.schema.json",
"name": "Pasta Paradise",
"description": "Authentic Italian dining in Boston",
"display": {
"activation_phrase": "Talk to Pasta Paradise",
"call_to_action": "Ask about our menu or make a reservation",
"suggested_prompts": [
"What pasta dishes do you have?",
"Make a reservation for Friday at 7 PM",
"Do you have gluten-free options?"
]
},
"system_prompt": "You are a helpful assistant for Pasta Paradise restaurant. Help customers with menu questions, reservations, and general information.",
"functions": [
{
"name": "get_menu",
"description": "Get menu items with optional filters",
"parameters": {
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": ["appetizers", "pasta", "mains", "desserts"]
}
},
"required": []
}
},
{
"name": "make_reservation",
"description": "Create a dining reservation",
"parameters": {
"type": "object",
"properties": {
"date": { "type": "string", "format": "date" },
"time": { "type": "string", "format": "time" },
"party_size": { "type": "integer" },
"name": { "type": "string" },
"phone": { "type": "string" }
},
"required": ["date", "time", "party_size", "name", "phone"]
}
}
]
}{
"$schema": "https://voicemanifest.org/voice-manifest/schema/0.0.1/voice-manifest.schema.json",
"name": "Premium Store",
"description": "Voice-enabled shopping experience",
"display": {
"activation_phrase": "Shop with voice",
"suggested_prompts": [
"Show me wireless headphones under $100",
"Where's my order?",
"Find blue running shoes"
]
},
"system_prompt": "You are a helpful shopping assistant. Help customers find products and track orders.",
"functions": [
{
"name": "search_products",
"description": "Search for products",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string" },
"max_price": { "type": "number" }
},
"required": ["query"]
}
}
],
"agent": {
"provider": {
"name": "retell",
"endpoint": "https://api.retellai.com/v1",
"agent_id": "agent_ecommerce_abc123"
}
}
}{
"$schema": "https://voicemanifest.org/voice-manifest/schema/0.0.1/voice-manifest.schema.json",
"name": "Healthcare Portal",
"description": "Voice-enabled patient portal",
"display": {
"suggested_prompts": [
"Schedule a checkup",
"Refill my prescription",
"When is my next appointment?"
]
},
"system_prompt": "You are a HIPAA-compliant healthcare assistant. Help patients with appointments and prescriptions. Never provide medical advice.",
"functions": [
{
"name": "schedule_appointment",
"description": "Schedule a medical appointment",
"parameters": {
"type": "object",
"properties": {
"appointment_type": {
"type": "string",
"enum": ["checkup", "follow-up", "specialist"]
},
"preferred_date": { "type": "string", "format": "date" }
},
"required": ["appointment_type"]
}
}
],
"agent": {
"provider": {
"stt": {
"name": "deepgram",
"model": "nova-2-medical",
"keywords": ["prescription", "appointment", "medication"]
},
"llm": {
"name": "openai",
"model": "gpt-4",
"temperature": 0.3
},
"tts": {
"name": "elevenlabs",
"voice_id": "professional-calm-female",
"speaking_rate": 0.9
}
}
},
"mcp": {
"servers": {
"ehr": {
"url": "https://api.healthcare.internal/mcp"
}
}
},
"privacy": {
"data_retention": "Voice recordings deleted immediately. Transcripts retained 7 days per HIPAA.",
"recording_consent": true,
"pii_handling": "encrypt"
}
}- User visits your website
- Voice client (browser, extension, OS) discovers
<link rel="voice-manifest"> - Voice client reads the manifest
- Voice client shows activation UI with your branding and suggested prompts
- User activates voice interaction
- Voice client uses:
- Your system prompt to guide behavior
- Your functions to understand available actions
- Your specified providers OR its own fallbacks
- Your MCP servers to execute actions
- Actions are executed and responses provided to user
This is a key feature that makes the Voice Manifest flexible:
If you specify no providers:
- Voice clients use their own (browser plugins, OS features, etc.)
- Example: Deepgram browser extension provides STT/LLM/TTS
If you specify some providers:
- Voice clients use what you specify
- Fall back to their own for unspecified components
- Example: You specify LLM, client provides STT/TTS
If you specify a voice agent:
- Voice clients use your all-in-one solution
- Voice agent provider handles STT/LLM/TTS
- Example: Your Retell agent does everything
If you specify composite (STT/LLM/TTS):
- Voice clients use your specified components
- Can still fall back if any fail
- Example: Your Deepgram + OpenAI + ElevenLabs stack
Voice Agent OR Composite - not both:
// ✅ Valid - Voice agent only
{
"agent": {
"provider": {
"name": "retell",
"endpoint": "..."
}
}
}
// ✅ Valid - Composite only
{
"agent": {
"provider": {
"stt": {...},
"llm": {...},
"tts": {...}
}
}
}
// ❌ Invalid - Cannot mix
{
"agent": {
"provider": {
"name": "retell",
"stt": {...} // ERROR: voice agent provides STT
}
}
}- Reservations: "Book a table for four tomorrow at 7"
- Menu inquiries: "What vegetarian options do you have?"
- Takeout orders: "Order the usual for pickup"
- Product search: "Show me wireless headphones under $100"
- Order tracking: "Where's my order?"
- Shopping: "Add size medium to my cart"
- Appointments: "Schedule a checkup next Tuesday"
- Prescriptions: "Refill my blood pressure medication"
- Information: "When are you open?"
- Balance: "What's my checking balance?"
- Transfers: "Transfer $50 to savings"
- Bill pay: "Pay my electric bill"
- Booking: "Book a window seat on the morning flight"
- Hotel: "Find hotels near the conference"
- Information: "What's my confirmation number?"
Start with the basics:
{
"name": "Your Site",
"display": {
"suggested_prompts": ["What can you help with?"]
},
"system_prompt": "You are a helpful assistant for [your site].",
"functions": [...]
}<link rel="voice-manifest" href="/voice-manifest.json" />When voice clients call your functions, you need to handle them. This typically means:
- REST API endpoints that execute the functions
- MCP server that provides the tools
- Webhook handlers that process requests
Use voice clients that support Voice Manifest:
- Browser extensions
- Voice agent platforms
- OS-level voice features
- Testing tools
{
"privacy": {
"data_retention": "Voice data retained for 30 days",
"recording_consent": true,
"pii_handling": "encrypt",
"privacy_policy": "https://example.com/privacy"
}
}- Never expose API keys in the manifest
- Use authentication for sensitive functions
- Validate all inputs server-side
- Rate limit voice interactions
- Log security events
- HTTPS only for all endpoints
| Feature | manifest.json | voice-manifest.json |
|---|---|---|
| Purpose | Make site installable as PWA | Make site voice-enabled |
| Discovery | <link rel="manifest"> |
<link rel="voice-manifest"> |
| Required fields | name, icons | name |
| Display config | icons, colors, display mode | activation phrase, suggested prompts |
| Functionality | Declares PWA capabilities | Declares voice capabilities |
| Provider config | N/A | Optional voice providers |
| Backend integration | Service workers | Functions + optional MCP |
The Voice Manifest is currently in early proposal stage (October 2025).
We're seeking feedback from:
- Voice platform providers
- Browser vendors
- Web developers
- Standards organizations
We welcome contributions and feedback! See the repository for:
- Examples: Complete working examples
- Schema: JSON Schema for validation
- Documentation: Detailed guides and references
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 license.