Voice Manifest Explainer

Version 0.0.1 | Released October 15, 2025

Introduction

The Voice Manifest (voice-manifest.json) makes websites voice-enabled in the same way that the Web App Manifest (manifest.json) makes websites installable as Progressive Web Apps.

Just as manifest.json tells browsers and operating systems "this website can act like a native app," the Voice Manifest tells voice agents, browsers, and operating systems "this website can be interacted with through voice."

The Problem

Voice AI is everywhere—in our phones, computers, cars, and smart speakers. Yet websites remain primarily visual interfaces that voice assistants struggle to interact with meaningfully.

When a user says "book a table at that Italian restaurant," their voice assistant might find the restaurant's website, but has no standardized way to:

Understand what voice interactions are possible
Know how to execute actions on the user's behalf
Provide a consistent voice experience

The Solution

The Voice Manifest provides a declarative way for websites to describe their voice capabilities. It's a simple JSON file that any compatible voice client can read to enable voice interactions.

<link rel="voice-manifest" href="/voice-manifest.json" />

Minimal Example

The simplest voice-enabled website needs only a name and optionally some display information:

{
  "name": "Pasta Paradise",
  "display": {
    "call_to_action": "Ask about our menu or make a reservation",
    "suggested_prompts": [
      "What pasta dishes do you have?",
      "Make a reservation for Friday"
    ]
  }
}

That's it! Any voice client (browser extension, OS feature, voice agent platform) can now:

Detect the site is voice-enabled
Show the activation phrase to users
Provide suggested prompts
Use its own STT/LLM/TTS providers to enable voice interaction

Core Concepts

1. Declaration, Not Configuration

The Voice Manifest is about what your site can do, not how to configure voice providers.

This is NOT a configuration file for your voice pipeline. It's a public declaration of your site's voice capabilities, similar to how manifest.json declares PWA capabilities.

2. Progressive Enhancement

Start simple, add complexity as needed:

Minimal: Just metadata and display hints
+ Functions: Add function calling for actions
+ System Prompt: Customize the voice assistant's behavior
+ MCP: Connect to backend services
+ Agent Config: Specify preferred voice providers (optional)

3. Provider Flexibility

The manifest supports multiple approaches:

No providers specified (Browser/OS provides fallback):

{
  "name": "My Site",
  "functions": [...]
}

With specific voice agent (All-in-one solution):

{
  "agent": {
    "provider": {
      "name": "retell",
      "endpoint": "https://api.retellai.com/v1",
      "agent_id": "agent_abc123"
    }
  }
}

With composite STT/LLM/TTS (Individual components):

{
  "agent": {
    "provider": {
      "stt": { "name": "deepgram" },
      "llm": { "name": "openai", "model": "gpt-4" },
      "tts": { "name": "elevenlabs" }
    }
  }
}

Key Features

Display Configuration

Control how your voice interface appears to users:

{
  "name": "Pasta Paradise",
  "short_name": "PP",
  "display": {
    "icon": "/icons/voice-icon.png",
    "background_color": "#8B0000",
    "theme_color": "#8B0000",
    "activation_phrase": "Talk to Pasta Paradise",
    "call_to_action": "Ask about our menu or make a reservation",
    "suggested_prompts": [
      "What pasta dishes do you have?",
      "Make a reservation for Friday at 7 PM",
      "Do you have gluten-free options?"
    ]
  }
}

These fields help voice clients present your site's capabilities in a user-friendly way.

System Prompt

Define how your voice assistant should behave:

{
  "system_prompt": "You are a helpful assistant for Pasta Paradise restaurant. Help customers with menu questions, reservations, and general information. Be warm, friendly, and knowledgeable about Italian cuisine."
}

Or reference an external file:

{
  "system_prompt": {
    "$ref": "./prompts/system-prompt.txt"
  }
}

Function Calling

Define actions using OpenAI's function calling standard:

{
  "functions": [
    {
      "name": "make_reservation",
      "description": "Create a dining reservation",
      "parameters": {
        "type": "object",
        "properties": {
          "date": {
            "type": "string",
            "format": "date",
            "description": "Reservation date (YYYY-MM-DD)"
          },
          "time": {
            "type": "string",
            "format": "time",
            "description": "Reservation time (HH:MM)"
          },
          "party_size": {
            "type": "integer",
            "minimum": 1,
            "maximum": 20,
            "description": "Number of guests"
          },
          "name": {
            "type": "string",
            "description": "Name for the reservation"
          },
          "phone": {
            "type": "string",
            "description": "Contact phone number"
          }
        },
        "required": ["date", "time", "party_size", "name", "phone"]
      }
    }
  ]
}

MCP Integration (Optional)

Connect to Model Context Protocol servers for tool discovery:

{
  "mcp": {
    "servers": {
      "restaurant": {
        "url": "https://api.restaurant.com/mcp"
      }
    }
  }
}

Voice clients connect to this URL to discover available tools, resources, and prompts via the MCP protocol. Your MCP server must already be running and accessible at this endpoint.

Voice Agent Configuration (Optional)

Specify preferred voice providers if you have specific requirements:

All-in-one voice agent:

{
  "agent": {
    "provider": {
      "name": "retell",
      "endpoint": "https://api.retellai.com/v1",
      "agent_id": "agent_abc123",
      "config": {
        "voice_id": "professional-female-us",
        "voice_speed": 1.0,
        "interruption_sensitivity": 0.5
      }
    }
  }
}

Composite STT/LLM/TTS:

{
  "agent": {
    "provider": {
      "stt": {
        "name": "deepgram",
        "model": "nova-2",
        "keywords": ["pasta", "reservation", "gluten-free"]
      },
      "llm": {
        "name": "openai",
        "model": "gpt-4",
        "temperature": 0.7
      },
      "tts": {
        "name": "elevenlabs",
        "voice_id": "clara-italian-warmth"
      }
    }
  }
}

Important: You can specify providers, but voice clients can use their own fallbacks if:

Providers aren't specified
Specified providers aren't available
Users prefer different providers

Complete Examples

1. Minimal Restaurant

{
  "$schema": "https://voicemanifest.org/voice-manifest/schema/0.0.1/voice-manifest.schema.json",
  "name": "Pasta Paradise",
  "description": "Authentic Italian dining in Boston",
  "display": {
    "activation_phrase": "Talk to Pasta Paradise",
    "call_to_action": "Ask about our menu or make a reservation",
    "suggested_prompts": [
      "What pasta dishes do you have?",
      "Make a reservation for Friday at 7 PM",
      "Do you have gluten-free options?"
    ]
  },
  "system_prompt": "You are a helpful assistant for Pasta Paradise restaurant. Help customers with menu questions, reservations, and general information.",
  "functions": [
    {
      "name": "get_menu",
      "description": "Get menu items with optional filters",
      "parameters": {
        "type": "object",
        "properties": {
          "category": {
            "type": "string",
            "enum": ["appetizers", "pasta", "mains", "desserts"]
          }
        },
        "required": []
      }
    },
    {
      "name": "make_reservation",
      "description": "Create a dining reservation",
      "parameters": {
        "type": "object",
        "properties": {
          "date": { "type": "string", "format": "date" },
          "time": { "type": "string", "format": "time" },
          "party_size": { "type": "integer" },
          "name": { "type": "string" },
          "phone": { "type": "string" }
        },
        "required": ["date", "time", "party_size", "name", "phone"]
      }
    }
  ]
}

2. E-Commerce with Voice Agent

{
  "$schema": "https://voicemanifest.org/voice-manifest/schema/0.0.1/voice-manifest.schema.json",
  "name": "Premium Store",
  "description": "Voice-enabled shopping experience",
  "display": {
    "activation_phrase": "Shop with voice",
    "suggested_prompts": [
      "Show me wireless headphones under $100",
      "Where's my order?",
      "Find blue running shoes"
    ]
  },
  "system_prompt": "You are a helpful shopping assistant. Help customers find products and track orders.",
  "functions": [
    {
      "name": "search_products",
      "description": "Search for products",
      "parameters": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "max_price": { "type": "number" }
        },
        "required": ["query"]
      }
    }
  ],
  "agent": {
    "provider": {
      "name": "retell",
      "endpoint": "https://api.retellai.com/v1",
      "agent_id": "agent_ecommerce_abc123"
    }
  }
}

3. Healthcare with Composite Agents + MCP

{
  "$schema": "https://voicemanifest.org/voice-manifest/schema/0.0.1/voice-manifest.schema.json",
  "name": "Healthcare Portal",
  "description": "Voice-enabled patient portal",
  "display": {
    "suggested_prompts": [
      "Schedule a checkup",
      "Refill my prescription",
      "When is my next appointment?"
    ]
  },
  "system_prompt": "You are a HIPAA-compliant healthcare assistant. Help patients with appointments and prescriptions. Never provide medical advice.",
  "functions": [
    {
      "name": "schedule_appointment",
      "description": "Schedule a medical appointment",
      "parameters": {
        "type": "object",
        "properties": {
          "appointment_type": {
            "type": "string",
            "enum": ["checkup", "follow-up", "specialist"]
          },
          "preferred_date": { "type": "string", "format": "date" }
        },
        "required": ["appointment_type"]
      }
    }
  ],
  "agent": {
    "provider": {
      "stt": {
        "name": "deepgram",
        "model": "nova-2-medical",
        "keywords": ["prescription", "appointment", "medication"]
      },
      "llm": {
        "name": "openai",
        "model": "gpt-4",
        "temperature": 0.3
      },
      "tts": {
        "name": "elevenlabs",
        "voice_id": "professional-calm-female",
        "speaking_rate": 0.9
      }
    }
  },
  "mcp": {
    "servers": {
      "ehr": {
        "url": "https://api.healthcare.internal/mcp"
      }
    }
  },
  "privacy": {
    "data_retention": "Voice recordings deleted immediately. Transcripts retained 7 days per HIPAA.",
    "recording_consent": true,
    "pii_handling": "encrypt"
  }
}

How It Works: The Flow

User visits your website
Voice client (browser, extension, OS) discovers <link rel="voice-manifest">
Voice client reads the manifest
Voice client shows activation UI with your branding and suggested prompts
User activates voice interaction
Voice client uses:
- Your system prompt to guide behavior
- Your functions to understand available actions
- Your specified providers OR its own fallbacks
- Your MCP servers to execute actions
Actions are executed and responses provided to user

Provider Fallback Strategy

This is a key feature that makes the Voice Manifest flexible:

If you specify no providers:

Voice clients use their own (browser plugins, OS features, etc.)
Example: Deepgram browser extension provides STT/LLM/TTS

If you specify some providers:

Voice clients use what you specify
Fall back to their own for unspecified components
Example: You specify LLM, client provides STT/TTS

If you specify a voice agent:

Voice clients use your all-in-one solution
Voice agent provider handles STT/LLM/TTS
Example: Your Retell agent does everything

If you specify composite (STT/LLM/TTS):

Voice clients use your specified components
Can still fall back if any fail
Example: Your Deepgram + OpenAI + ElevenLabs stack

Mutual Exclusivity

Voice Agent OR Composite - not both:

// ✅ Valid - Voice agent only
{
  "agent": {
    "provider": {
      "name": "retell",
      "endpoint": "..."
    }
  }
}

// ✅ Valid - Composite only
{
  "agent": {
    "provider": {
      "stt": {...},
      "llm": {...},
      "tts": {...}
    }
  }
}

// ❌ Invalid - Cannot mix
{
  "agent": {
    "provider": {
      "name": "retell",
      "stt": {...}  // ERROR: voice agent provides STT
    }
  }
}

Real-World Use Cases

Restaurants & Hospitality

Reservations: "Book a table for four tomorrow at 7"
Menu inquiries: "What vegetarian options do you have?"
Takeout orders: "Order the usual for pickup"

E-Commerce

Product search: "Show me wireless headphones under $100"
Order tracking: "Where's my order?"
Shopping: "Add size medium to my cart"

Healthcare

Appointments: "Schedule a checkup next Tuesday"
Prescriptions: "Refill my blood pressure medication"
Information: "When are you open?"

Banking & Finance

Balance: "What's my checking balance?"
Transfers: "Transfer $50 to savings"
Bill pay: "Pay my electric bill"

Travel

Booking: "Book a window seat on the morning flight"
Hotel: "Find hotels near the conference"
Information: "What's my confirmation number?"

Implementation

Step 1: Create the Manifest

Start with the basics:

{
  "name": "Your Site",
  "display": {
    "suggested_prompts": ["What can you help with?"]
  },
  "system_prompt": "You are a helpful assistant for [your site].",
  "functions": [...]
}

Step 2: Link from HTML

<link rel="voice-manifest" href="/voice-manifest.json" />

Step 3: Implement Function Handlers

When voice clients call your functions, you need to handle them. This typically means:

REST API endpoints that execute the functions
MCP server that provides the tools
Webhook handlers that process requests

Step 4: Test

Use voice clients that support Voice Manifest:

Browser extensions
Voice agent platforms
OS-level voice features
Testing tools

Privacy & Security

Privacy Considerations

{
  "privacy": {
    "data_retention": "Voice data retained for 30 days",
    "recording_consent": true,
    "pii_handling": "encrypt",
    "privacy_policy": "https://example.com/privacy"
  }
}

Security Best Practices

Never expose API keys in the manifest
Use authentication for sensitive functions
Validate all inputs server-side
Rate limit voice interactions
Log security events
HTTPS only for all endpoints

Comparison to manifest.json

Feature	manifest.json	voice-manifest.json
Purpose	Make site installable as PWA	Make site voice-enabled
Discovery	`<link rel="manifest">`	`<link rel="voice-manifest">`
Required fields	name, icons	name
Display config	icons, colors, display mode	activation phrase, suggested prompts
Functionality	Declares PWA capabilities	Declares voice capabilities
Provider config	N/A	Optional voice providers
Backend integration	Service workers	Functions + optional MCP

Specification Status

The Voice Manifest is currently in early proposal stage (October 2025).

We're seeking feedback from:

Voice platform providers
Browser vendors
Web developers
Standards organizations

Contributing

We welcome contributions and feedback! See the repository for:

Examples: Complete working examples
Schema: JSON Schema for validation
Documentation: Detailed guides and references

License

This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 license.

FilesExpand file tree

explainer.md

Latest commit

History