|
| 1 | +After you understand the capabilities of the Azure Speech MCP server, the next step is to connect it to an agent and start using it. This involves setting up storage, creating an agent in Microsoft Foundry, connecting the Speech MCP tool, testing it in the agent playground, and optionally building a client application. |
| 2 | + |
| 3 | +## Set up Azure Blob Storage |
| 4 | + |
| 5 | +The Azure Speech MCP server requires an Azure Storage account to store audio files. You need to create a storage account and a blob container before connecting the tool. |
| 6 | + |
| 7 | +1. In the [Azure portal](https://portal.azure.com), create a new **Azure Storage account** (or use an existing one). |
| 8 | +1. In the storage account, expand **Data storage** and select **Containers**. |
| 9 | +1. Create a new container (for example, named **files**) to store the audio files your agent generates and reads. |
| 10 | +1. Generate a **SAS token** for the container with the following permissions: Read, Add, Create, Write, and List. Set the expiry time to the shortest practical duration. |
| 11 | + |
| 12 | +> [!IMPORTANT] |
| 13 | +> Copy the generated SAS URL and store it securely — you need it when connecting the Speech MCP server. |
| 14 | +
|
| 15 | +## Create a Foundry project and agent |
| 16 | + |
| 17 | +To use the Azure Speech MCP server, you need a Microsoft Foundry project with a deployed model. |
| 18 | + |
| 19 | +1. In the [Microsoft Foundry portal](https://ai.azure.com), create a new project (or use an existing one). |
| 20 | +1. Deploy a model (such as **gpt-4.1**) that your agent will use for reasoning and generating responses. |
| 21 | +1. Create an agent and give it instructions that describe its purpose. For example: |
| 22 | + |
| 23 | + ``` |
| 24 | + You are an AI agent that uses the Azure AI Speech tool to transcribe and generate speech. |
| 25 | + ``` |
| 26 | +
|
| 27 | +The agent is now ready to receive tool connections. |
| 28 | +
|
| 29 | +## Connect the Azure Speech MCP server |
| 30 | +
|
| 31 | +You connect the Azure Speech MCP server to your agent through the **Tools** page in the Foundry portal. |
| 32 | +
|
| 33 | +1. In the navigation pane, select the **Tools** page. |
| 34 | +1. Select **Connect a tool** and choose **Azure Speech in Foundry Tools** from the catalog. |
| 35 | +1. Configure the connection with the following settings: |
| 36 | + - **Foundry resource name**: The name of your Foundry resource (for example, `myproject-resource`). |
| 37 | + - **Bearer** (`Ocp-Apim-Subscription-Key`): The key for your Foundry project. |
| 38 | + - **X-Blob-Container-Url**: The SAS URL for your blob container. |
| 39 | +
|
| 40 | +1. Wait for the connection to be created, then select **Use in an agent** and choose your agent. |
| 41 | +
|
| 42 | +:::image type="content" source="../media/azure-speech-tool-config.png" alt-text="Screenshot of the Tools catalog in the Foundry portal showing the Azure Speech in Foundry Tools connection configuration."::: |
| 43 | +
|
| 44 | +The agent now has access to the speech-to-text and text-to-speech tools exposed by the Azure Speech MCP server. |
| 45 | +
|
| 46 | +> [!TIP] |
| 47 | +> You can find the project key on the project home page in the Foundry portal. |
| 48 | +
|
| 49 | +## Test in the agent playground |
| 50 | +
|
| 51 | +The agent playground in the Foundry portal provides an interactive environment for testing your agent. |
| 52 | +
|
| 53 | +### Test text-to-speech |
| 54 | +
|
| 55 | +Enter a prompt that asks the agent to generate speech: |
| 56 | +
|
| 57 | +``` |
| 58 | +Generate "To be or not to be, that is the question." as speech |
| 59 | +``` |
| 60 | +
|
| 61 | +The first time the agent uses the Speech MCP tool, you're prompted to **approve** the tool usage. You can select **Always approve all Azure Speech MCP Server tools** to skip future approval prompts. |
| 62 | +
|
| 63 | +The response includes a link to the generated audio file saved in your blob container. Select the link to listen to the synthesized speech. |
| 64 | +
|
| 65 | +### Test speech-to-text |
| 66 | +
|
| 67 | +Enter a prompt that asks the agent to transcribe an audio file. You can use a publicly accessible URL or a SAS URL pointing to a file in your blob container: |
| 68 | +
|
| 69 | +``` |
| 70 | +Transcribe the file at https://example.com/audio/meeting-recording.wav |
| 71 | +``` |
| 72 | +
|
| 73 | +The agent calls the speech-to-text tool and returns the transcribed text. |
| 74 | +
|
| 75 | +### Customizing speech output |
| 76 | +
|
| 77 | +The Speech MCP tools support several options you can specify in your prompts: |
| 78 | +
|
| 79 | +- **Voice selection**: Specify a neural voice, such as `en-GB-SoniaNeural` or `en-US-JennyNeural`. |
| 80 | +- **Language**: Specify the language for recognition or synthesis (for example, `es-ES` for Spanish). |
| 81 | +- **Phrase hints**: Provide domain-specific terms to improve transcription accuracy (for example, "Azure, OpenAI, Cognitive Services"). |
| 82 | +- **Profanity filtering**: Request `masked`, `removed`, or `raw` profanity handling during transcription. |
| 83 | +
|
| 84 | +For example: |
| 85 | +
|
| 86 | +``` |
| 87 | +Synthesize "Better a witty fool, than a foolish wit!" as speech using the voice "en-GB-SoniaNeural". |
| 88 | +``` |
| 89 | +
|
| 90 | +## Build a client application |
| 91 | +
|
| 92 | +While the agent playground is useful for testing, you typically want to build a client application that uses the agent programmatically. The Microsoft Foundry SDK supports this through the OpenAI Responses API. |
| 93 | +
|
| 94 | +To build a client application, you use the `azure-ai-projects` and `azure-identity` packages. The general pattern is: |
| 95 | +
|
| 96 | +1. Create an `AIProjectClient` using your Foundry project endpoint and `DefaultAzureCredential` (which uses your Azure CLI credentials in development). |
| 97 | +1. Get an OpenAI client from the project client by calling `get_openai_client()`. |
| 98 | +1. Call `responses.create()` to send a user prompt to the agent. |
| 99 | +
|
| 100 | +The key part is how you reference the agent — you specify it by name in the `extra_body` parameter: |
| 101 | +
|
| 102 | +```python |
| 103 | +response = openai_client.responses.create( |
| 104 | + input=[{"role": "user", "content": user_prompt}], |
| 105 | + extra_body={ |
| 106 | + "agent_reference": { |
| 107 | + "name": "Speech-Agent", |
| 108 | + "type": "agent_reference" |
| 109 | + } |
| 110 | + }, |
| 111 | +) |
| 112 | +
|
| 113 | +print(response.output_text) |
| 114 | +``` |
| 115 | + |
| 116 | +The agent processes the prompt, calls the appropriate Speech MCP tool, and returns the result in `output_text`. For text-to-speech requests, the output includes a link to the generated audio file in your blob container. |
| 117 | + |
| 118 | +### Connect the MCP server in code |
| 119 | + |
| 120 | +Instead of connecting the Azure Speech MCP server through the Foundry portal, you can define the MCP tool connection directly in code when you create an agent. Use the `MCPTool` class from the `azure-ai-projects` SDK: |
| 121 | + |
| 122 | +```python |
| 123 | +from azure.ai.projects.models import MCPTool |
| 124 | + |
| 125 | +mcp_tool = MCPTool( |
| 126 | + server_label="azure-speech", |
| 127 | + server_url="https://{foundry-resource-name}.cognitiveservices.azure.com/speech/mcp", |
| 128 | + require_approval="always", |
| 129 | +) |
| 130 | +``` |
| 131 | + |
| 132 | +You then pass the `mcp_tool` when creating the agent through the SDK. This approach is useful when you want to manage tool connections as part of your application code rather than configuring them manually in the portal. |
0 commit comments