Skip to content

vbandi/mcp-realtime-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MCP Voice Agent Demo

A demonstration of an MCP-enabled (Model Context Protocol) realtime voice agent that combines OpenAI's Realtime API with dynamic MCP tool integration. Entirely AI coded (but with human guidance).

Video (turn on sound)

mcp-realtime-poc.demo.mp4

🎯 Project Goal

This project demonstrates how to build a voice-controlled AI assistant that can dynamically discover and use tools from multiple MCP servers. The agent uses natural speech input to interact with various tools through OpenAI's Realtime API, showcasing the power of MCP for extending AI capabilities. Multiple servers can be configured via a standard mcp.json file, similar to Claude Desktop.

note: (currently only stdio transport is supported)

πŸ—οΈ Architecture

The project consists of two main components:

MCP Voice Agent (Python)

  • Location: mcp-voice-agent/ directory
  • Technology: Python 3.11+ with OpenAI Agents SDK
  • Features:
    • Realtime voice interaction via OpenAI's Realtime API
    • Dynamic MCP tool discovery and integration from multiple servers
    • Automatic function generation from MCP schemas
    • Audio input/output handling
    • Console character set verification for emoji support
    • Multi-server configuration via mcp.json

CalculatorMcp Server (C#)

  • Location: CalculatorMcp/ directory
  • Technology: .NET 8 with MCP SDK
  • Features:
    • 13 mathematical and utility tools
    • MCP protocol implementation
    • Stdio-based communication

πŸ› οΈ Available Tools

The CalculatorMcp server provides these tools:

  • Math: add(a, b), multiply(a, b), circle_area(radius)
  • Numbers: random_between(min, max), is_even(number)
  • Strings: reverse_string(text), count_letter(text, letter), string_contains(text, substring)
  • Utilities: convert_temperature(temp, fromUnit, toUnit), delay(seconds), format_date(), days_until(date)

πŸ”§ MCP Server Configuration

Configure multiple MCP servers using a mcp.json file in the project root. The format follows Claude Desktop's standard:

{
  "mcpServers": {
    "calculator": {
      "transport": "stdio",
      "command": "dotnet",
      "args": ["run", "--no-build", "--project", "CalculatorMcp/CalculatorMcp.csproj", "-v", "q"],
      "env": {}
    },
    "my-custom-server": {
      "transport": "stdio",
      "command": "python",
      "args": ["my_server.py"],
      "env": {
        "API_KEY": "your-key-here"
      }
    }
  }
}

Supported Transports:

  • stdio - Local process communication (currently supported)

Notes:

  • Server names (keys) must be unique and contain only alphanumeric characters, hyphens, and underscores
  • The voice agent automatically aggregates tools from all configured servers
  • Tools are prefixed with their server name to avoid conflicts (e.g., calculator_add, my_server_custom_tool)

πŸ“‹ Prerequisites

  • .NET 8 SDK (for the C# MCP server)
  • Python 3.11+
  • OpenAI API Key in environment:
    # Set permanently
    setx OPENAI_API_KEY "your-api-key-here"
    
    # Or set for current session
    $env:OPENAI_API_KEY="your-api-key-here"
  • Microphone and speakers (default Windows audio devices)

πŸš€ Quick Start

1. Setup Environment

# Clone or extract the project
cd mcp-realtime-poc

# Create Python virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# Install Python dependencies
pip install -r .\mcp-voice-agent\requirements.txt

2. Configure MCP Servers

Create mcp.json in the project root (see configuration section above). A basic configuration for the included CalculatorMcp server:

{
  "mcpServers": {
    "calculator": {
      "transport": "stdio",
      "command": "dotnet",
      "args": ["run", "--no-build", "--project", "CalculatorMcp/CalculatorMcp.csproj", "-v", "q"],
      "env": {}
    }
  }
}

3. Build MCP Server(s)

# Build the C# MCP server
dotnet build .\CalculatorMcp\CalculatorMcp.csproj -c Release

4. Run the Voice Agent

# Single command - launches all configured MCP servers automatically
python .\mcp-voice-agent\main.py

🎀 Usage Examples

Once running, speak naturally to the agent:

  • "Add 7 and 13" β†’ Performs addition using calculator server
  • "Give me a random number between 10 and 20" β†’ Generates random number
  • "Convert 50 Celsius to Fahrenheit" β†’ Temperature conversion
  • "Reverse the word hello" β†’ String manipulation
  • "What's the date in a nice format?" β†’ Date formatting
  • "Wait for 2 seconds" β†’ Delay execution

With multiple servers configured, you can access tools from any server:

  • "Use the calculator to multiply 5 and 8" β†’ Explicitly calls calculator server
  • "Run my custom analysis on this data" β†’ Calls tool from custom server

Press Ctrl+C to exit gracefully.

πŸ“ Project Structure

mcp-realtime-poc/
β”œβ”€β”€ mcp.json                 # MCP server configuration
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ CalculatorMcp/          # C# MCP server
β”‚   β”œβ”€β”€ CalculatorMcp.csproj
β”‚   β”œβ”€β”€ CalculatorTools.cs
β”‚   └── Program.cs
β”œβ”€β”€ mcp-voice-agent/        # Main Python application
β”‚   β”œβ”€β”€ main.py            # Entry point
β”‚   β”œβ”€β”€ requirements.txt   # Python dependencies
β”‚   β”œβ”€β”€ mcp_voice_agent/       # MCP integration module
β”‚   β”‚   β”œβ”€β”€ mcp_client_sdk.py # Official MCP SDK client + MultiMCPClient
β”‚   β”‚   β”œβ”€β”€ dynamic_tools.py # Dynamic function generation
β”‚   β”‚   β”œβ”€β”€ audio.py       # Audio handling
β”‚   β”‚   └── settings.py    # Configuration + MCPServerConfig
β”‚   └── tests/             # Unit tests
└── artifacts/              # Development files (ignored)

βš™οΈ Configuration

The voice agent automatically detects your console's character encoding and provides emoji feedback. If emojis don't display correctly, ensure your terminal is set to UTF-8:

chcp 65001  # Set console to UTF-8

πŸ”§ Development

Running Tests

cd mcp-voice-agent
python -m pytest tests/

Adding New MCP Servers

  1. Create or obtain an MCP server that implements the MCP protocol
  2. Add server configuration to mcp.json:
    {
      "mcpServers": {
        "my-server": {
          "transport": "stdio",
          "command": "your-command",
          "args": ["arg1", "arg2"],
          "env": {"KEY": "value"}
        }
      }
    }
  3. Restart the voice agent - it will automatically discover and integrate tools from the new server
  4. Test the integration by asking the agent to use tools from your new server

πŸ“š Technical Details

  • Audio: PCM16 at 24kHz (optimal for OpenAI Realtime API)
  • Communication: Stdio-based MCP transport between Python and configured servers
  • Tool Generation: Dynamic Python function creation from MCP schemas with server prefixing
  • Multi-Server Support: Tools aggregated from all configured servers with automatic conflict resolution
  • Error Handling: Comprehensive logging with emoji indicators
  • Platform: Windows (PowerShell), with cross-platform potential

πŸ”— References

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors