Skip to content

Conversation

@HareeshBahuleyan
Copy link
Contributor

@HareeshBahuleyan HareeshBahuleyan commented Jun 30, 2025

User description

Based on:
https://github.com/elevenlabs/elevenlabs-mcp/

Found that mcpm already has an unofficial implementation:
https://github.com/pathintegral-institute/mcpm.sh/blob/main/mcp-registry/servers/elevenlabs.json
Not sure if you would like to keep both.


PR Type

Enhancement


Description

  • Add official ElevenLabs MCP server configuration

  • Include 22 comprehensive audio processing tools

  • Support text-to-speech, speech-to-text, and AI agents

  • Provide voice cloning and sound effects generation


Changes diagram

flowchart LR
  A["MCP Registry"] --> B["ElevenLabs Server"]
  B --> C["Text-to-Speech Tools"]
  B --> D["Speech-to-Text Tools"]
  B --> E["Voice Management"]
  B --> F["AI Agent Tools"]
  C --> G["Audio Output"]
  D --> H["Text Transcripts"]
  E --> I["Voice Library"]
  F --> J["Conversational AI"]
Loading

Changes walkthrough 📝

Relevant files
Enhancement
elevenlabs-mcp.json
Complete ElevenLabs MCP server configuration                         

mcp-registry/servers/elevenlabs-mcp.json

  • Add complete ElevenLabs MCP server configuration with 22 tools
  • Include text-to-speech, speech-to-text, and voice cloning capabilities
  • Define AI agent creation and management tools
  • Provide comprehensive audio processing and manipulation features
  • +595/-0 

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copilot AI review requested due to automatic review settings June 30, 2025 11:01
    Copy link
    Contributor

    Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR adds an official ElevenLabs MCP server entry to the registry, providing metadata, installation details, a comprehensive set of ElevenLabs API tools, and usage examples.

    • Introduces elevenlabs-mcp.json with server metadata, arguments, and installation via UVX.
    • Defines thirty-plus API tools (TTS, STT, agent management, etc.) with full input schemas.
    • Includes two example use cases demonstrating text-to-speech and transcription.
    Comments suppressed due to low confidence (2)

    mcp-registry/servers/elevenlabs-mcp.json:56

    • Consider enforcing mutual exclusivity between voice_name and voice_id in the text_to_speech schema (e.g., using a JSON Schema oneOf) to prevent invalid combinations.
                "voice_name": {
    

    mcp-registry/servers/elevenlabs-mcp.json:115

    • The speech_to_text tool uses language_code (ISO 639-3) while text_to_speech uses language (ISO 639-1). Consider standardizing these field names and formats for consistency.
                "language_code": {
    

    @qodo-merge-pro
    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 Security concerns

    API key exposure:
    The configuration requires an ELEVEN_API_KEY environment variable which contains sensitive authentication credentials. While the configuration properly uses environment variable substitution (${ELEVEN_API_KEY}), users need to ensure this API key is stored securely and not exposed in logs or configuration files. The example value "..." should remain as a placeholder to prevent accidental key exposure.

    ⚡ Recommended focus areas for review

    Validation Issue

    The JSON schema validation needs verification. Several tools have complex parameter combinations and constraints that should be validated, such as mutually exclusive voice_id/voice_name parameters in text_to_speech, and the duration constraints for text_to_sound_effects.

    {
      "name": "text_to_speech",
      "description": "Convert text to speech with a given voice and save the output audio file to a given directory.\nDirectory is optional, if not provided, the output file will be saved to $HOME/Desktop.\nOnly one of voice_id or voice_name can be provided. If none are provided, the default voice will be used.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "The text to convert to speech."
          },
          "voice_name": {
            "type": "string",
            "description": "The name of the voice to use."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where files should be saved. Defaults to $HOME/Desktop if not provided."
          },
          "voice_id": {
            "type": "string",
            "description": "The ID of the voice to use."
          },
          "stability": {
            "type": "number",
            "description": "Stability of the generated audio. Lower values introduce broader emotional range. Range is 0 to 1."
          },
          "similarity_boost": {
            "type": "number",
            "description": "Similarity boost of the generated audio. Determines how closely the AI should adhere to the original voice. Range is 0 to 1."
          },
          "style": {
            "type": "number",
            "description": "Style exaggeration of the voice. Amplifies the style of the original speaker. Range is 0 to 1."
          },
          "use_speaker_boost": {
            "type": "boolean",
            "description": "Boosts the similarity to the original speaker."
          },
          "speed": {
            "type": "number",
            "description": "Controls the speed of the generated speech. Range is 0.7 to 1.2."
          },
          "language": {
            "type": "string",
            "description": "ISO 639-1 language code for the voice."
          },
          "output_format": {
            "type": "string",
            "description": "Output format of the generated audio (e.g., mp3_44100_128).",
            "enum": [
              "mp3_22050_32", "mp3_44100_32", "mp3_44100_64", "mp3_44100_96", "mp3_44100_128", "mp3_44100_192",
              "pcm_8000", "pcm_16000", "pcm_22050", "pcm_24000", "pcm_44100",
              "ulaw_8000", "alaw_8000", "opus_48000_32", "opus_48000_64", "opus_48000_96", "opus_48000_128", "opus_48000_192"
            ]
          }
        },
        "required": ["text"]
      }
    },
    Cost Warnings

    Multiple tools include cost warnings but the implementation and user experience of these warnings should be verified. The warning format and placement in descriptions may need standardization across all cost-incurring tools.

      "description": "Convert text to speech with a given voice and save the output audio file to a given directory.\nDirectory is optional, if not provided, the output file will be saved to $HOME/Desktop.\nOnly one of voice_id or voice_name can be provided. If none are provided, the default voice will be used.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "The text to convert to speech."
          },
          "voice_name": {
            "type": "string",
            "description": "The name of the voice to use."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where files should be saved. Defaults to $HOME/Desktop if not provided."
          },
          "voice_id": {
            "type": "string",
            "description": "The ID of the voice to use."
          },
          "stability": {
            "type": "number",
            "description": "Stability of the generated audio. Lower values introduce broader emotional range. Range is 0 to 1."
          },
          "similarity_boost": {
            "type": "number",
            "description": "Similarity boost of the generated audio. Determines how closely the AI should adhere to the original voice. Range is 0 to 1."
          },
          "style": {
            "type": "number",
            "description": "Style exaggeration of the voice. Amplifies the style of the original speaker. Range is 0 to 1."
          },
          "use_speaker_boost": {
            "type": "boolean",
            "description": "Boosts the similarity to the original speaker."
          },
          "speed": {
            "type": "number",
            "description": "Controls the speed of the generated speech. Range is 0.7 to 1.2."
          },
          "language": {
            "type": "string",
            "description": "ISO 639-1 language code for the voice."
          },
          "output_format": {
            "type": "string",
            "description": "Output format of the generated audio (e.g., mp3_44100_128).",
            "enum": [
              "mp3_22050_32", "mp3_44100_32", "mp3_44100_64", "mp3_44100_96", "mp3_44100_128", "mp3_44100_192",
              "pcm_8000", "pcm_16000", "pcm_22050", "pcm_24000", "pcm_44100",
              "ulaw_8000", "alaw_8000", "opus_48000_32", "opus_48000_64", "opus_48000_96", "opus_48000_128", "opus_48000_192"
            ]
          }
        },
        "required": ["text"]
      }
    },
    {
      "name": "speech_to_text",
      "description": "Transcribe speech from an audio file and either save the output text file or return the text directly.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "input_file_path": {
            "type": "string",
            "description": "Path to the audio file to transcribe."
          },
          "language_code": {
            "type": "string",
            "description": "ISO 639-3 language code for transcription (default: 'eng')."
          },
          "diarize": {
            "type": "boolean",
            "description": "Annotate which speaker is currently speaking in the transcription."
          },
          "save_transcript_to_file": {
            "type": "boolean",
            "description": "Whether to save the transcript to a file."
          },
          "return_transcript_to_client_directly": {
            "type": "boolean",
            "description": "Whether to return the transcript to the client directly."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where files should be saved. Defaults to $HOME/Desktop if not provided."
          }
        },
        "required": ["input_file_path"]
      }
    },
    {
      "name": "text_to_sound_effects",
      "description": "Convert a text description to a sound effect and save it to a file.\nDuration must be between 0.5 and 5 seconds.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "text": {
            "type": "string",
            "description": "Text description of the sound effect."
          },
          "duration_seconds": {
            "type": "number",
            "description": "Duration of the sound effect in seconds (0.5 to 5)."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where files should be saved. Defaults to $HOME/Desktop if not provided."
          },
          "output_format": {
            "type": "string",
            "description": "Output format of the generated audio.",
             "enum": [
              "mp3_22050_32", "mp3_44100_32", "mp3_44100_64", "mp3_44100_96", "mp3_44100_128", "mp3_44100_192",
              "pcm_8000", "pcm_16000", "pcm_22050", "pcm_24000", "pcm_44100",
              "ulaw_8000", "alaw_8000", "opus_48000_32", "opus_48000_64", "opus_48000_96", "opus_48000_128", "opus_48000_192"
            ]
          }
        },
        "required": ["text"]
      }
    },
    {
      "name": "search_voices",
      "description": "Search for existing voices in the user's ElevenLabs voice library. Searches in name, description, labels and category.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "search": {
            "type": "string",
            "description": "Search term to filter voices by."
          },
          "sort": {
            "type": "string",
            "description": "Which field to sort by.",
            "enum": ["created_at_unix", "name"]
          },
          "sort_direction": {
            "type": "string",
            "description": "Sort order, either ascending or descending.",
            "enum": ["asc", "desc"]
          }
        },
        "required": []
      }
    },
    {
      "name": "list_models",
      "description": "List all available models.",
      "inputSchema": {
        "type": "object",
        "properties": {},
        "required": []
      }
    },
    {
      "name": "get_voice",
      "description": "Get details of a specific voice.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "voice_id": {
            "type": "string",
            "description": "The ID of the voice to retrieve."
          }
        },
        "required": ["voice_id"]
      }
    },
    {
      "name": "voice_clone",
      "description": "Create an instant voice clone of a voice using provided audio files.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "description": "The name for the new cloned voice."
          },
          "files": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "A list of file paths to the audio files for cloning."
          },
          "description": {
            "type": "string",
            "description": "A description for the new voice."
          }
        },
        "required": ["name", "files"]
      }
    },
    {
      "name": "isolate_audio",
      "description": "Isolate audio from a file and save the output audio file to a given directory.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "input_file_path": {
            "type": "string",
            "description": "Path to the audio file to process."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where files should be saved. Defaults to $HOME/Desktop if not provided."
          }
        },
        "required": ["input_file_path"]
      }
    },
    {
      "name": "check_subscription",
      "description": "Check the current subscription status. Could be used to measure the usage of the API.",
      "inputSchema": {
        "type": "object",
        "properties": {},
        "required": []
      }
    },
    {
      "name": "create_agent",
      "description": "Create a conversational AI agent with custom configuration.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "description": "Name of the agent."
          },
          "first_message": {
            "type": "string",
            "description": "First message the agent will say."
          },
          "system_prompt": {
            "type": "string",
            "description": "System prompt for the agent."
          },
          "voice_id": {
            "type": "string",
            "description": "ID of the voice to use for the agent."
          },
          "language": {
            "type": "string",
            "description": "ISO 639-1 language code for the agent."
          },
          "llm": {
            "type": "string",
            "description": "LLM to use for the agent."
          },
          "temperature": {
            "type": "number",
            "description": "Temperature for the agent's responses (0 to 1)."
          },
          "max_tokens": {
            "type": "integer",
            "description": "Maximum number of tokens to generate."
          },
          "asr_quality": {
            "type": "string",
            "description": "Quality of the ASR ('high' or 'low')."
          },
          "model_id": {
            "type": "string",
            "description": "ID of the ElevenLabs model to use for the agent."
          },
          "optimize_streaming_latency": {
            "type": "integer",
            "description": "Optimize streaming latency (0 to 4)."
          },
          "stability": {
            "type": "number",
            "description": "Stability for the agent's voice (0 to 1)."
          },
          "similarity_boost": {
            "type": "number",
            "description": "Similarity boost for the agent's voice (0 to 1)."
          },
          "turn_timeout": {
            "type": "integer",
            "description": "Timeout for the agent to respond in seconds."
          },
          "max_duration_seconds": {
            "type": "integer",
            "description": "Maximum duration of a conversation in seconds."
          },
          "record_voice": {
            "type": "boolean",
            "description": "Whether to record the agent's voice."
          },
          "retention_days": {
            "type": "integer",
            "description": "Number of days to retain the agent's data."
          }
        },
        "required": ["name", "first_message", "system_prompt"]
      }
    },
    {
      "name": "add_knowledge_base_to_agent",
      "description": "Add a knowledge base to an ElevenLabs agent from a URL, file, or text. Allowed file types are epub, pdf, docx, txt, html.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": {
            "type": "string",
            "description": "ID of the agent to add the knowledge base to."
          },
          "knowledge_base_name": {
            "type": "string",
            "description": "Name of the knowledge base."
          },
          "url": {
            "type": "string",
            "description": "URL of the knowledge base."
          },
          "input_file_path": {
            "type": "string",
            "description": "Path to the file to add to the knowledge base."
          },
          "text": {
            "type": "string",
            "description": "Text to add to the knowledge base."
          }
        },
        "required": ["agent_id", "knowledge_base_name"]
      }
    },
    {
      "name": "list_agents",
      "description": "List all available conversational AI agents.",
      "inputSchema": {
        "type": "object",
        "properties": {},
        "required": []
      }
    },
    {
      "name": "get_agent",
      "description": "Get details about a specific conversational AI agent.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": {
            "type": "string",
            "description": "The ID of the agent to retrieve."
          }
        },
        "required": ["agent_id"]
      }
    },
    {
      "name": "get_conversation",
      "description": "Gets conversation with transcript. Returns conversation details and full transcript. Use when analyzing completed agent conversations.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "conversation_id": {
            "type": "string",
            "description": "The unique identifier of the conversation to retrieve."
          }
        },
        "required": ["conversation_id"]
      }
    },
    {
      "name": "list_conversations",
      "description": "Lists agent conversations with metadata. Use when asked about conversation history.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "agent_id": {
            "type": "string",
            "description": "Filter conversations by specific agent ID."
          },
          "cursor": {
            "type": "string",
            "description": "Pagination cursor for retrieving the next page of results."
          },
          "call_start_before_unix": {
            "type": "integer",
            "description": "Filter conversations that started before this Unix timestamp."
          },
          "call_start_after_unix": {
            "type": "integer",
            "description": "Filter conversations that started after this Unix timestamp."
          },
          "page_size": {
            "type": "integer",
            "description": "Number of conversations to return per page (1-100, defaults to 30)."
          },
          "max_length": {
            "type": "integer",
            "description": "Maximum length of the response text."
          }
        },
        "required": []
      }
    },
    {
      "name": "speech_to_speech",
      "description": "Transform audio from one voice to another using a provided audio file.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "input_file_path": {
            "type": "string",
            "description": "Path to the source audio file."
          },
          "voice_name": {
            "type": "string",
            "description": "The name of the target voice."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where the output file should be saved. Defaults to $HOME/Desktop."
          }
        },
        "required": ["input_file_path"]
      }
    },
    {
      "name": "text_to_voice",
      "description": "Create voice previews from a text prompt. Creates three previews with slight variations and saves them to a directory.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "voice_description": {
            "type": "string",
            "description": "A detailed description of the voice to be generated."
          },
          "text": {
            "type": "string",
            "description": "The text to be spoken in the preview. If not provided, it will be auto-generated."
          },
          "output_directory": {
            "type": "string",
            "description": "Directory where the preview files should be saved. Defaults to $HOME/Desktop."
          }
        },
        "required": ["voice_description"]
      }
    },
    {
      "name": "create_voice_from_preview",
      "description": "Add a generated voice to the voice library using the voice ID from the `text_to_voice` tool.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "generated_voice_id": {
            "type": "string",
            "description": "The ID of the generated voice preview."
          },
          "voice_name": {
            "type": "string",
            "description": "The name for the new voice."
          },
          "voice_description": {
            "type": "string",
            "description": "A description for the new voice."
          }
        },
        "required": ["generated_voice_id", "voice_name", "voice_description"]
      }
    },
    {
      "name": "make_outbound_call",
      "description": "Make an outbound call using an ElevenLabs agent.\n\n\u26a0\ufe0f COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.",

    @qodo-merge-pro
    Copy link
    Contributor

    qodo-merge-pro bot commented Jun 30, 2025

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    @niechen
    Copy link
    Contributor

    niechen commented Jul 1, 2025

    We prefer keeping the official mcp in the registry. Happy to remove the unofficial one if the official one covers all functionality.

    @HareeshBahuleyan
    Copy link
    Contributor Author

    Thanks @niechen - Feel free to update the PR (to remove the unofficial mcp) or merge it as it

    @HareeshBahuleyan
    Copy link
    Contributor Author

    Hi @niechen
    I have now dropped the unofficial elevenlabs MCP json. If it's okay with you, could you check and merge this PR? Thanks

    @niechen niechen merged commit e81082a into pathintegral-institute:main Jul 4, 2025
    3 checks passed
    @niechen
    Copy link
    Contributor

    niechen commented Jul 4, 2025

    thank you. merged

    @mcpm-semantic-release
    Copy link

    🎉 This PR is included in version 2.0.0 🎉

    The release is available on GitHub release

    Your semantic-release bot 📦🚀

    @HareeshBahuleyan
    Copy link
    Contributor Author

    Hi @niechen After the latest mcpm 2.0 release, when I check the servers here https://getmcp.io/api/servers.json, I still see the old elevenlabs MCP? Could you share some info on how or when the servers on that link get updated? Thanks!

    @HareeshBahuleyan
    Copy link
    Contributor Author

    Hi @niechen After the latest mcpm 2.0 release, when I check the servers here https://getmcp.io/api/servers.json, I still see the old elevenlabs MCP? Could you share some info on how or when the servers on that link get updated? Thanks!

    Hi @niechen could you please support with this query?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants