-
Notifications
You must be signed in to change notification settings - Fork 36
Multi-container setup for agent -> MCP server networking #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a multi-container architecture for the MCP (Model Context Protocol) demo application by splitting the previous single container into two separate services: an MCP server and an agent. The infrastructure is modified to support private networking with VNet integration while loosening some network restrictions to enable deployment and debugging without a VPN.
Key changes:
- Split the monolithic container app into two services:
server(MCP server) andagent(client that consumes the MCP server) - Added health check endpoint to the MCP server with corresponding health probes in the infrastructure
- Modified network security settings to allow public access to container registry, Log Analytics, and Azure Monitor for operational convenience
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
servers/deployed_mcp.py |
Added /health endpoint for container health probes |
servers/Dockerfile |
Removed BuildKit cache mounts for Azure Container Apps compatibility |
agents/Dockerfile |
New Dockerfile for agent container with same multi-stage Alpine pattern |
infra/server.bicep |
New infrastructure module for MCP server with health probes configuration |
infra/agent.bicep |
Renamed and reconfigured from aca.bicep; now agent-specific with no ingress |
infra/main.bicep |
Split single container into server and agent modules; loosened network access for debugging |
infra/main.parameters.json |
Updated parameter names from acaExists to serverExists and agentExists |
infra/core/host/container-registry.bicep |
Enabled public network access unconditionally for image pushing |
infra/core/host/container-apps-environment.bicep |
Added workload profiles for private ingress scenarios |
infra/core/host/container-app.bicep |
Added probes parameter support; fixed hostName output for non-ingress containers |
infra/core/host/container-app-upsert.bicep |
Added probes parameter pass-through |
azure.yaml |
Split single service into server and agent services with remote build enabled |
.dockerignore |
New file to exclude development artifacts from Docker build context |
Comments suppressed due to low confidence (1)
infra/agent.bicep:55
- The agent's Python code (
agents/agentframework_http.py) referencesAZURE_OPENAI_VERSIONviaos.environ.get("AZURE_OPENAI_VERSION")when using the Azure API host, but this environment variable is not set in the agent's container configuration. This will result in the Azure OpenAI client using a default or None value forapi_version, which may cause API calls to fail. Add anAZURE_OPENAI_VERSIONenvironment variable to the agent's env array, similar to how it's done for other Azure OpenAI settings.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
servers/deployed_mcp.py
Outdated
|
|
||
|
|
||
| @mcp.custom_route("/health", methods=["GET"]) | ||
| async def health_check(request): |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health_check function parameter request is defined but never used in the function body. Consider using _request or _ to indicate this is intentionally unused, following Python naming conventions for unused parameters.
| async def health_check(request): | |
| async def health_check(_request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
agents/Dockerfile
Outdated
| WORKDIR /code | ||
|
|
||
| # Copy dependency files and install dependencies (for layer caching) | ||
| # Note: We avoid --mount=type=cache since Azure Container Apps remote build doesn't support BuildKit |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment states "Azure Container Apps remote build doesn't support BuildKit" but this may be outdated. As of 2024, Azure Container Apps supports BuildKit features in remote builds. Consider verifying this limitation with current Azure documentation, as using --mount=type=cache can significantly improve build performance by caching package downloads between builds.
| # Note: We avoid --mount=type=cache since Azure Container Apps remote build doesn't support BuildKit | |
| # Note: As of 2024, Azure Container Apps remote build supports BuildKit. Consider using --mount=type=cache to cache package downloads and improve build performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a link to open issue about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
infra/server.bicep:90
- [nitpick] The startup probe has a very aggressive timeout configuration:
initialDelaySeconds: 10,periodSeconds: 3, andfailureThreshold: 60, allowing up to 3 minutes (10s + 60 * 3s = 190s) for the service to start. However, the readiness probe withinitialDelaySeconds: 5may start checking before the startup probe completes its first successful check, potentially causing confusion in probe state transitions.
Consider setting readiness probe's initialDelaySeconds to at least match or exceed the startup probe's initial delay (e.g., 15-20 seconds) to ensure cleaner state transitions.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| module app 'core/host/container-app-upsert.bicep' = { | ||
| name: '${serviceName}-container-app-module' | ||
| params: { | ||
| name: name | ||
| location: location | ||
| tags: union(tags, { 'azd-service-name': serviceName }) | ||
| identityName: agentIdentity.name | ||
| exists: exists | ||
| containerAppsEnvironmentName: containerAppsEnvironmentName | ||
| containerRegistryName: containerRegistryName | ||
| ingressEnabled: false | ||
| env: [ | ||
| { | ||
| name: 'AZURE_OPENAI_CHAT_DEPLOYMENT' | ||
| value: openAiDeploymentName | ||
| } | ||
| { | ||
| name: 'AZURE_OPENAI_ENDPOINT' | ||
| value: openAiEndpoint | ||
| } | ||
| { | ||
| name: 'API_HOST' | ||
| value: 'azure' | ||
| } | ||
| { | ||
| name: 'AZURE_CLIENT_ID' | ||
| value: agentIdentity.properties.clientId | ||
| } | ||
| { | ||
| name: 'MCP_SERVER_URL' | ||
| value: mcpServerUrl | ||
| } | ||
| { | ||
| name: 'RUNNING_IN_PRODUCTION' | ||
| value: 'true' | ||
| } | ||
| ] | ||
| } |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent container runs agentframework_http.py, which executes a single query and then exits. With the default containerMinReplicas: 1, this will cause the container to continuously restart after completing its task, wasting resources and potentially causing cost issues.
Consider either:
- Setting
containerMinReplicas: 0andcontainerMaxReplicas: 0for the agent if it's meant to be triggered manually - Or modifying
agentframework_http.pyto run as a long-lived service with a loop or server endpoint - Or using Azure Container Apps Jobs instead of a Container App for one-off task execution
README.md
Outdated
| ### Deploy to Azure with private networking | ||
|
|
||
| To demonstrate enhanced security for production deployments, this project supports deploying with a virtual network (VNet) configuration that restricts public access to Azure resources. | ||
|
|
||
| 1. Set these azd environment variables to set up a virtual network and private endpoints for the Container App, Cosmos DB, and OpenAI resources: | ||
|
|
||
| ```bash | ||
| azd env set USE_VNET true | ||
| azd env set USE_PRIVATE_INGRESS true | ||
| ``` | ||
|
|
||
| The Log Analytics and ACR resources will still have public access enabled, so that you can deploy and monitor the app without needing a VPN. In production, you would typically restrict these as well. | ||
|
|
||
| 2. Provision and deploy: | ||
|
|
||
| ```bash | ||
| azd up | ||
| ``` | ||
|
|
||
| ### Additional costs for private networking |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading level is inconsistent. This section "Deploy to Azure with private networking" is using h3 (###) but should be h2 (##) to match the same level as "Deploy to Azure" (line 191). Similarly, "Additional costs for private networking" should also be h3 (###) to match "Costs" (line 240).
| env: [ | ||
| { | ||
| name: 'AZURE_OPENAI_CHAT_DEPLOYMENT' | ||
| value: openAiDeploymentName | ||
| } | ||
| { | ||
| name: 'AZURE_OPENAI_ENDPOINT' | ||
| value: openAiEndpoint | ||
| } | ||
| { | ||
| name: 'API_HOST' | ||
| value: 'azure' | ||
| } | ||
| { | ||
| name: 'AZURE_CLIENT_ID' | ||
| value: agentIdentity.properties.clientId | ||
| } | ||
| { | ||
| name: 'MCP_SERVER_URL' | ||
| value: mcpServerUrl | ||
| } | ||
| { | ||
| name: 'RUNNING_IN_PRODUCTION' | ||
| value: 'true' | ||
| } | ||
| ] | ||
| } |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent container is missing the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable that the server has (line 695 in main.bicep). This inconsistency means the agent won't have access to Application Insights for monitoring and telemetry, making it harder to debug issues when deployed.
Consider adding an applicationInsightsConnectionString parameter to agent.bicep and passing it as an environment variable, similar to how it's done in server.bicep.
| name: replace('${take(prefix,15)}-server', '--', '-') | ||
| location: location | ||
| tags: tags | ||
| identityName: '${prefix}-id-aca' | ||
| identityName: '${prefix}-id-server' | ||
| containerAppsEnvironmentName: containerApps.outputs.environmentName | ||
| containerRegistryName: containerApps.outputs.registryName | ||
| openAiDeploymentName: openAiDeploymentName | ||
| openAiEndpoint: openAi.outputs.endpoint | ||
| cosmosDbAccount: cosmosDb.outputs.name | ||
| cosmosDbDatabase: cosmosDbDatabaseName | ||
| cosmosDbContainer: cosmosDbContainerName | ||
| applicationInsightsConnectionString: useMonitoring ? applicationInsights.outputs.connectionString : '' | ||
| exists: acaExists | ||
| applicationInsightsConnectionString: useMonitoring ? applicationInsights!.outputs.connectionString : '' | ||
| exists: serverExists | ||
| } | ||
| } | ||
|
|
||
| // Container app for agent | ||
| module agent 'agent.bicep' = { | ||
| name: 'agent' | ||
| scope: resourceGroup | ||
| params: { | ||
| name: replace('${take(prefix,15)}-agent', '--', '-') |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The container app names use take(prefix,15) which could lead to naming collisions if the first 15 characters of the prefix are identical for both "server" and "agent". For example, if prefix is "myapp-abc123456", both would start with "myapp-abc123456" and only differ by the suffix. While the suffixes "-server" and "-agent" make them unique in this case, the truncation to 15 characters is very conservative and could cause issues with very short environment names.
Consider using take(prefix,19) for both (to match the original pattern) or at least verify that 15 characters + suffix won't exceed Azure Container App name limits (which allow up to 32 characters).
This PR adds the second container for the agent. I also modified the private networking rules, loosening them slightly so that we can run "azd deploy" without a VPN and also see the Log stream from the Portal.
Now these azd env vars can be set, and the containers will communicate within the subnet:
USE_PRIVATE_INGRESS="true"
USE_VNET="true"
After provisioning, the container apps environment network is all closed off:
And it has two containers:
Then we can consult the Log stream to see that the agent was successful:
The only gotcha is that the agent may deploy and start before the server is ready. In that case, re-start-ing the agent from the Portal should work as it re-runs the ENTRYPOINT. We may address that with an initContainer in the future that waits for the health endpoint to return a 200.