A demonstration of an MCP-enabled (Model Context Protocol) realtime voice agent that combines OpenAI's Realtime API with dynamic MCP tool integration. Entirely AI coded (but with human guidance).
mcp-realtime-poc.demo.mp4
This project demonstrates how to build a voice-controlled AI assistant that can dynamically discover and use tools from multiple MCP servers. The agent uses natural speech input to interact with various tools through OpenAI's Realtime API, showcasing the power of MCP for extending AI capabilities. Multiple servers can be configured via a standard mcp.json file, similar to Claude Desktop.
note: (currently only stdio transport is supported)
The project consists of two main components:
- Location:
mcp-voice-agent/directory - Technology: Python 3.11+ with OpenAI Agents SDK
- Features:
- Realtime voice interaction via OpenAI's Realtime API
- Dynamic MCP tool discovery and integration from multiple servers
- Automatic function generation from MCP schemas
- Audio input/output handling
- Console character set verification for emoji support
- Multi-server configuration via
mcp.json
- Location:
CalculatorMcp/directory - Technology: .NET 8 with MCP SDK
- Features:
- 13 mathematical and utility tools
- MCP protocol implementation
- Stdio-based communication
The CalculatorMcp server provides these tools:
- Math:
add(a, b),multiply(a, b),circle_area(radius) - Numbers:
random_between(min, max),is_even(number) - Strings:
reverse_string(text),count_letter(text, letter),string_contains(text, substring) - Utilities:
convert_temperature(temp, fromUnit, toUnit),delay(seconds),format_date(),days_until(date)
Configure multiple MCP servers using a mcp.json file in the project root. The format follows Claude Desktop's standard:
{
"mcpServers": {
"calculator": {
"transport": "stdio",
"command": "dotnet",
"args": ["run", "--no-build", "--project", "CalculatorMcp/CalculatorMcp.csproj", "-v", "q"],
"env": {}
},
"my-custom-server": {
"transport": "stdio",
"command": "python",
"args": ["my_server.py"],
"env": {
"API_KEY": "your-key-here"
}
}
}
}Supported Transports:
stdio- Local process communication (currently supported)
Notes:
- Server names (keys) must be unique and contain only alphanumeric characters, hyphens, and underscores
- The voice agent automatically aggregates tools from all configured servers
- Tools are prefixed with their server name to avoid conflicts (e.g.,
calculator_add,my_server_custom_tool)
- .NET 8 SDK (for the C# MCP server)
- Python 3.11+
- OpenAI API Key in environment:
# Set permanently setx OPENAI_API_KEY "your-api-key-here" # Or set for current session $env:OPENAI_API_KEY="your-api-key-here"
- Microphone and speakers (default Windows audio devices)
# Clone or extract the project
cd mcp-realtime-poc
# Create Python virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
# Install Python dependencies
pip install -r .\mcp-voice-agent\requirements.txtCreate mcp.json in the project root (see configuration section above). A basic configuration for the included CalculatorMcp server:
{
"mcpServers": {
"calculator": {
"transport": "stdio",
"command": "dotnet",
"args": ["run", "--no-build", "--project", "CalculatorMcp/CalculatorMcp.csproj", "-v", "q"],
"env": {}
}
}
}# Build the C# MCP server
dotnet build .\CalculatorMcp\CalculatorMcp.csproj -c Release# Single command - launches all configured MCP servers automatically
python .\mcp-voice-agent\main.pyOnce running, speak naturally to the agent:
- "Add 7 and 13" β Performs addition using calculator server
- "Give me a random number between 10 and 20" β Generates random number
- "Convert 50 Celsius to Fahrenheit" β Temperature conversion
- "Reverse the word hello" β String manipulation
- "What's the date in a nice format?" β Date formatting
- "Wait for 2 seconds" β Delay execution
With multiple servers configured, you can access tools from any server:
- "Use the calculator to multiply 5 and 8" β Explicitly calls calculator server
- "Run my custom analysis on this data" β Calls tool from custom server
Press Ctrl+C to exit gracefully.
mcp-realtime-poc/
βββ mcp.json # MCP server configuration
βββ README.md # This file
βββ CalculatorMcp/ # C# MCP server
β βββ CalculatorMcp.csproj
β βββ CalculatorTools.cs
β βββ Program.cs
βββ mcp-voice-agent/ # Main Python application
β βββ main.py # Entry point
β βββ requirements.txt # Python dependencies
β βββ mcp_voice_agent/ # MCP integration module
β β βββ mcp_client_sdk.py # Official MCP SDK client + MultiMCPClient
β β βββ dynamic_tools.py # Dynamic function generation
β β βββ audio.py # Audio handling
β β βββ settings.py # Configuration + MCPServerConfig
β βββ tests/ # Unit tests
βββ artifacts/ # Development files (ignored)
The voice agent automatically detects your console's character encoding and provides emoji feedback. If emojis don't display correctly, ensure your terminal is set to UTF-8:
chcp 65001 # Set console to UTF-8cd mcp-voice-agent
python -m pytest tests/- Create or obtain an MCP server that implements the MCP protocol
- Add server configuration to
mcp.json:{ "mcpServers": { "my-server": { "transport": "stdio", "command": "your-command", "args": ["arg1", "arg2"], "env": {"KEY": "value"} } } } - Restart the voice agent - it will automatically discover and integrate tools from the new server
- Test the integration by asking the agent to use tools from your new server
- Audio: PCM16 at 24kHz (optimal for OpenAI Realtime API)
- Communication: Stdio-based MCP transport between Python and configured servers
- Tool Generation: Dynamic Python function creation from MCP schemas with server prefixing
- Multi-Server Support: Tools aggregated from all configured servers with automatic conflict resolution
- Error Handling: Comprehensive logging with emoji indicators
- Platform: Windows (PowerShell), with cross-platform potential
- OpenAI Agents SDK Documentation
- OpenAI Realtime API Guide
- MCP Specification
- C# MCP SDK
- Python MCP SDK
- Claude Desktop MCP Configuration
This project is licensed under the MIT License - see the LICENSE file for details.