This guide walks through deploying On-Call Copilot as a Hosted Agent on Microsoft Foundry Agent Service — from prerequisites through local testing to production verification.
Official quickstart: Deploy your first hosted agent using Azure Developer CLI
How-to guide: Deploy a hosted agent
Note: Hosted agents are currently in preview.
- What Is a Hosted Agent?
- Prerequisites
- Key Configuration Files
- Step 1: Authenticate and Prepare
- Step 2: Provision Azure Resources
- Step 3: Test the Agent Locally
- Step 4: Deploy to Foundry Agent Service
- Step 5: Verify and Test the Deployed Agent
- Option B: Deploy with Python SDK (CI/CD)
- Environment Variables
- Container Configuration
- Authentication
- Scaling and Resources
- Updating the Agent
- Using the Foundry Portal Playground
- Deploy via VS Code Extension
- Troubleshooting
- Cleanup
A Hosted Agent is a containerised application deployed to Microsoft Foundry Agent Service. Foundry manages the container lifecycle (scaling, health checks, networking) while your code handles the agent logic. The agent is exposed via the Responses API protocol at port 8088.
On-Call Copilot runs as a single container that hosts four specialist agents (Triage, Summary, Comms, PIR) concurrently using the Microsoft Agent Framework ConcurrentBuilder. All four agents share a single Model Router deployment — Foundry routes each request to the best model automatically.
┌─────────────────────────────────────────────────────────┐
│ Foundry Agent Service │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Hosted Agent Container (port 8088) │ │
│ │ │ │
│ │ main.py → ConcurrentBuilder │ │
│ │ ├── triage-agent │ │
│ │ ├── summary-agent │ │
│ │ ├── comms-agent │ │
│ │ └── pir-agent │ │
│ │ │ │
│ │ Protocol: Responses API │ │
│ └───────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Microsoft Foundry Model Router │
│ (single deployment) │
└─────────────────────────────────────────────────────────┘
| Requirement | Details |
|---|---|
| Azure subscription | With Contributor access for resource provisioning (create a free account) |
| Microsoft Foundry project | With a capability host that has enablePublicHostingEnvironment=true (quickstart) |
| Model Router deployment | Deployed in your Azure OpenAI resource — or use gpt-4.1/gpt-5 if Model Router is unavailable in your region (model catalog) |
| Azure Developer CLI | v1.23.0+ — azd version (install) |
| Azure CLI | v2.80+ (optional, for verification and SDK deploy) — az --version (install) |
| Docker Desktop | Running — verify with docker info (install) |
| Python 3.10+ | python --version (download) |
| Authenticated sessions | az login and azd auth login |
Register the Cognitive Services provider if you hit SubscriptionNotRegistered:
az provider register --namespace Microsoft.CognitiveServicesThe deployment uses three files in the repo root:
Declares the agent name, protocols, and environment variables for Foundry:
kind: hosted
name: oncall-copilot
protocols:
- protocol: responses
environment_variables:
- name: AZURE_OPENAI_ENDPOINT
value: ${AZURE_OPENAI_ENDPOINT}
- name: AZURE_OPENAI_CHAT_DEPLOYMENT_NAME
value: model-router
- name: AZURE_AI_PROJECT_ENDPOINT
value: ${AZURE_AI_PROJECT_ENDPOINT}
- name: MODEL_ROUTER_DEPLOYMENT
value: ${MODEL_ROUTER_DEPLOYMENT}
- name: LOG_LEVEL
value: INFOKey fields:
kind: hosted— tells Foundry this is a containerised agentprotocol: responses— exposes the Responses API on port 8088environment_variables— injected into the container at runtime
Note: If you are not using an MCP server, remove or comment out any
AZURE_AI_PROJECT_TOOL_CONNECTION_IDlines fromagent.yamlbefore deploying.
Controls infrastructure provisioning via azd:
name: oncall-copilot
services:
oncall-copilot:
project: .
host: azure.ai.agent
language: docker
docker:
remoteBuild: true
config:
container:
resources:
cpu: "1"
memory: 2Gi
scale:
maxReplicas: 3
minReplicas: 1
deployments:
- model:
format: OpenAI
name: model-router
version: "2025-11-18"
name: model-router
sku:
capacity: 10
name: GlobalStandardazd will prompt you for the following during azd provision if not already set:
- Azure subscription — select the subscription for Foundry resources
- Location — choose a region that supports Model Router (e.g.
eastus2,swedencentral) - Model SKU — the SKU available for your region and subscription
- Deployment name — name for the model deployment
- Container memory / CPU — resource allocation (or accept defaults)
- Minimum / Maximum replicas — scaling configuration
Standard Python 3.12 slim image exposing port 8088:
FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY . user_agent/
WORKDIR /app/user_agent
RUN if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
EXPOSE 8088
CMD ["python", "main.py"]Important: The Foundry runtime expects the agent to listen on port 8088. Do not change this.
Sign in to both CLIs before any provisioning or deployment:
az login
azd auth loginVerify Docker Desktop is running — this is required for the build step:
docker infoIf this command fails, start Docker Desktop and wait for it to initialise before continuing.
Requires: Contributor access on your Azure subscription.
Run azd provision to create all required Azure resources (~5 minutes):
azd provisionThis creates the following resources:
| Resource | Purpose | Cost |
|---|---|---|
| Resource group | Organises all related resources | No cost |
| Model deployment | Model used by the agent | See Foundry pricing |
| Foundry project | Hosts your agent and provides AI capabilities | Consumption-based |
| Azure Container Registry | Stores agent container images | Basic tier |
| Log Analytics Workspace | Centralises log data | No direct cost |
| Application Insights | Monitors agent performance and logs | Pay-as-you-go |
| Managed identity | Authenticates the agent to Azure services | No cost |
Tip: Run
azd downwhen you finish to delete resources and stop charges.
If the resource group name already exists, azd provision reuses the existing group. To avoid conflicts, choose a unique environment name or delete the existing resource group first.
Before investing in a full cloud deployment, verify the agent works on your machine.
Bash:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtPowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txtBash:
azd env get-values > .envPowerShell:
azd env get-values > .envThen add the model deployment name to .env:
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="model-router"
python main.pyThe agent binds to port 8088. If it fails to start, check:
| Error | Cause | Fix |
|---|---|---|
AuthenticationError or DefaultAzureCredential failure |
Stale login session | Run azd auth login again |
ResourceNotFound |
Wrong endpoint URL | Verify endpoint URLs in the Foundry portal |
DeploymentNotFound |
Wrong deployment name | Check Build → Deployments in the portal |
Connection refused |
Port 8088 in use | Stop any other process using that port |
The local server accepts the raw incident JSON body directly at /responses — no wrapping required.
Bash:
curl -X POST http://localhost:8088/responses \
-H "Content-Type: application/json" \
-d '{
"incident_id": "INC-TEST-001",
"title": "SEV2: Redis master down",
"severity": "SEV2",
"timeframe": {"start": "2026-01-01T10:00:00Z", "end": null},
"alerts": [{"name": "RedisDown", "description": "Redis master unreachable", "timestamp": "2026-01-01T10:00:00Z"}],
"logs": [],
"metrics": []
}'PowerShell:
$body = Get-Content -Raw scripts/demos/demo_1_simple_alert.json
Invoke-RestMethod -Method Post `
-Uri "http://localhost:8088/responses" `
-ContentType "application/json" `
-Body $bodyOr use the existing test script directly:
.\scripts\test_local.ps1 -Demo 1You should see a structured JSON response with suspected_root_causes, summary, comms, and post_incident_report keys.
Stop the local server with Ctrl+C.
azd up combines provisioning, packaging, and deployment into one command — equivalent to running azd provision, azd package, and azd deploy separately. If you already ran azd provision in Step 2, azd up will skip re-provisioning unchanged infrastructure.
Verify Docker is running before this step:
docker info
azd upazd up will:
- Provision infrastructure (if not already done)
- Build the Docker image (remote build — no local Docker build required by default)
- Push the image to Azure Container Registry
- Register the hosted agent with Foundry Agent Service
- Start the container
The first deployment takes longer because Docker needs to pull base layers. Subsequent deployments are faster.
When finished, azd up outputs:
Deploying services (azd deploy)
(✓) Done: Deploying service oncall-copilot
- Agent playground (portal): https://ai.azure.com/nextgen/.../build/agents/oncall-copilot/build?version=1
- Agent endpoint: https://ai-account-<name>.services.ai.azure.com/api/projects/<project>/agents/oncall-copilot/versions/1
Save the Agent endpoint URL — you need it to call the agent programmatically.
Warning: Your hosted agent incurs charges while deployed. Run
azd downwhen finished testing to stop charges.
Find your resource names first:
| Value | Where to find it |
|---|---|
| Account name | Foundry portal → your project → Overview → first part of the project endpoint URL (before .services.ai.azure.com) |
| Project name | Foundry portal → your project → Overview → project name |
| Agent name | Foundry portal → Build → Agents → agent name in the list |
Then run:
az cognitiveservices agent show \
--account-name <your-account-name> \
--project-name <your-project-name> \
--name oncall-copilotLook for status: Started in the output.
| Status | Meaning | Action |
|---|---|---|
Provisioning |
Agent is still starting | Wait 2–3 minutes and check again |
Started |
Agent is running | Ready to use |
Failed |
Deployment error | Run azd deploy to retry; check portal logs |
Stopped |
Manually stopped | Run az cognitiveservices agent start |
Unhealthy |
Container crashing | Check deployment logs in the Foundry portal |
The deployed agent uses the Responses API. The body must include an agent reference and the incident JSON as the user message content.
Bash:
curl -X POST "<project-endpoint>/openai/responses?api-version=2025-05-15-preview" \
-H "Authorization: Bearer $(az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv)" \
-H "Content-Type: application/json" \
-d '{
"agent": {"type": "agent_reference", "name": "oncall-copilot"},
"input": [
{
"role": "user",
"content": "Analyze the following incident data and provide triage, summary, communications, and a post-incident report:\n\n{\"incident_id\":\"INC-TEST-001\",\"title\":\"Test incident\",\"severity\":\"SEV2\",\"alerts\":[{\"name\":\"HighCPU\",\"description\":\"CPU at 95%\",\"timestamp\":\"2026-01-01T10:00:00Z\"}],\"logs\":[{\"source\":\"app\",\"lines\":[\"ERROR: timeout\"]}],\"metrics\":[{\"name\":\"cpu_percent\",\"window\":\"5m\",\"values_summary\":\"95%\"}]}"
}
]
}'PowerShell:
$token = (az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv)
$body = @{
agent = @{ type = "agent_reference"; name = "oncall-copilot" }
input = @(@{
role = "user"
content = 'Analyze the following incident data and provide triage, summary, communications, and a post-incident report:\n\n{"incident_id":"INC-TEST-001","title":"Test incident","severity":"SEV2"}'
})
} | ConvertTo-Json -Depth 5
Invoke-RestMethod -Method Post `
-Uri "<project-endpoint>/openai/responses?api-version=2025-05-15-preview" `
-Headers @{ Authorization = "Bearer $token" } `
-ContentType "application/json" `
-Body $bodyOr use the provided script (handles authentication and body format automatically):
python scripts/invoke.py --demo 1python scripts/invoke.py --demo 1
python scripts/run_scenarios.pypython scripts/verify_agent.pyThis script queries agent info (name, version, image, kind), runs a smoke test, and prints the first 500 chars of the response.
MOCK_MODE=true python scripts/validate.pyUse this for automated pipelines or when you need fine-grained control over the image tag and deployment.
# Build for linux/amd64 (required by Foundry)
docker build --platform linux/amd64 -t oncall-copilot:v1 .
# Push to your Azure Container Registry
az acr login --name <your-registry>
docker tag oncall-copilot:v1 <your-registry>.azurecr.io/oncall-copilot:v1
docker push <your-registry>.azurecr.io/oncall-copilot:v1Note (Apple Silicon / ARM): Foundry requires
linux/amd64images. Always pass--platform linux/amd64. Windows x64 users do not need this flag.
The Foundry project's managed identity needs Container Registry Repository Reader on your ACR:
# Get the project managed identity principal ID
PRINCIPAL_ID=$(az cognitiveservices account show \
--name <your-account-name> \
--resource-group <your-rg> \
--query identity.principalId -o tsv)
# Get the ACR resource ID
ACR_ID=$(az acr show --name <your-registry> --query id -o tsv)
# Assign Container Registry Repository Reader role
az role assignment create \
--assignee "$PRINCIPAL_ID" \
--role "Container Registry Repository Reader" \
--scope "$ACR_ID"pip install --pre "azure-ai-projects>=2.0.0b3" azure-identityBash:
export AZURE_AI_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>"
export AZURE_OPENAI_ENDPOINT="https://<account>.openai.azure.com/"
export ACR_IMAGE="<your-registry>.azurecr.io/oncall-copilot:v1"
export MODEL_ROUTER_DEPLOYMENT="model-router"PowerShell:
$env:AZURE_AI_PROJECT_ENDPOINT = "https://<account>.services.ai.azure.com/api/projects/<project>"
$env:AZURE_OPENAI_ENDPOINT = "https://<account>.openai.azure.com/"
$env:ACR_IMAGE = "<your-registry>.azurecr.io/oncall-copilot:v1"
$env:MODEL_ROUTER_DEPLOYMENT = "model-router"python scripts/deploy_sdk.pyThe script uses the azure-ai-projects SDK (ImageBasedHostedAgentDefinition with ProtocolVersionRecord) to create a hosted agent version, injects the environment variables, and registers the agent.
python scripts/verify_agent.py| Variable | Required | Description |
|---|---|---|
AZURE_OPENAI_ENDPOINT |
Yes | Your Azure OpenAI resource endpoint |
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME |
Yes | Model deployment name (typically model-router) |
AZURE_AI_PROJECT_ENDPOINT |
Yes | Full project endpoint URL |
MODEL_ROUTER_DEPLOYMENT |
Yes | Model Router deployment name |
LOG_LEVEL |
No | Logging level (default: INFO) |
These are set in agent.yaml and injected automatically by Foundry at container startup. For local development, export them from your provisioned environment using azd env get-values > .env.
Finding your values:
Variable Where to find it AZURE_OPENAI_ENDPOINTFoundry portal → your project → Overview → Endpoint AZURE_AI_PROJECT_ENDPOINTFoundry portal → your project → Overview → Project endpoint AZURE_OPENAI_CHAT_DEPLOYMENT_NAMEFoundry portal → Build → Deployments → your Model Router deployment name
Defined in azure.yaml:
container:
resources:
cpu: "1"
memory: 2GiThese are minimum recommended values. The agent runs four concurrent LLM calls per request so memory usage scales with concurrent requests.
The Foundry runtime expects the agent to listen on port 8088. This is set in the Dockerfile (EXPOSE 8088) and handled by from_agent_framework() in main.py.
Foundry automatically monitors container health. If the container fails to start or crashes, Foundry will restart it based on the scaling configuration.
In production, the container uses DefaultAzureCredential with the project's managed identity — no API keys or secrets are needed:
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
_credential = DefaultAzureCredential()
_token_provider = get_bearer_token_provider(
_credential, "https://cognitiveservices.azure.com/.default"
)Foundry automatically provides the managed identity to the container.
| Role | Scope | Purpose |
|---|---|---|
| Cognitive Services OpenAI User | Azure OpenAI resource | Call Model Router completions |
| Container Registry Repository Reader | ACR (SDK deploy only) | Pull container image |
For local development, DefaultAzureCredential picks up your az login session:
az login
azd auth login
python main.pyConfigured in azure.yaml:
scale:
maxReplicas: 3
minReplicas: 1minReplicas: 1— always keep one instance warm (avoids cold starts)maxReplicas: 3— scale out under load
Adjust these based on expected traffic:
| Scenario | minReplicas | maxReplicas |
|---|---|---|
| Dev/test | 0 | 1 |
| Production (low traffic) | 1 | 3 |
| Production (high traffic) | 2 | 10 |
The Model Router deployment quota is set in azure.yaml:
deployments:
- name: model-router
sku:
capacity: 10
name: GlobalStandardEach On-Call Copilot request triggers 4 concurrent Model Router calls. If you expect N concurrent users, set capacity to at least N × 4.
- Make your changes (edit agent instructions, add new agents, update schemas)
- Rebuild and redeploy:
With azd:
azd upWith SDK:
docker build --platform linux/amd64 -t oncall-copilot:v2 .
docker tag oncall-copilot:v2 <your-registry>.azurecr.io/oncall-copilot:v2
docker push <your-registry>.azurecr.io/oncall-copilot:v2
export ACR_IMAGE="<your-registry>.azurecr.io/oncall-copilot:v2"
python scripts/deploy_sdk.pyAgent instructions are plain text strings in app/agents/*.py. To change behaviour:
- Edit the
*_INSTRUCTIONSconstant in the relevant file - Rebuild the container and redeploy (instructions are baked into the image)
- No infrastructure changes needed
See AGENTS.md for details on each agent's instruction format.
- Create
app/agents/<name>.pywith a*_INSTRUCTIONSconstant - Add output keys to
app/schemas.py - Register in
main.py:new_agent = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=NEW_INSTRUCTIONS, name="new-agent", ) workflow = ConcurrentBuilder().participants([triage, summary, comms, pir, new_agent])
- Rebuild and redeploy the container
The fastest way to interact with a deployed agent is through the Foundry portal:
- Open the Foundry portal and sign in with your Azure account
- Select your project from the Recent projects list (or All projects)
- In the left navigation, select Build → Agents
- Find
oncall-copilotin the agents list and select it - Select Open in playground in the top toolbar
- Paste an incident JSON payload in the chat input and press Enter
You can also use the direct playground link printed by azd up after a successful deployment:
Agent playground (portal): https://ai.azure.com/nextgen/.../build/agents/oncall-copilot/build?version=1
Tip: If the playground doesn't load or the agent doesn't respond, verify the agent status is
Startedusing the CLI command in Step 5.
You can deploy directly from the IDE using the Microsoft Foundry for Visual Studio Code extension:
- Install the extension: Extensions (
Ctrl+Shift+X) → search Microsoft Foundry → Install - Open Command Palette (
Ctrl+Shift+P) → Microsoft Foundry: Set Default Project - Sign in and select your subscription, resource group, and Foundry project
- Right-click the project in the Foundry Explorer and select Deploy Hosted Agent
- Once deployed, select Open in Playground to test
See the VS Code extension documentation for the full workflow.
| Problem | Cause | Fix |
|---|---|---|
azd init fails |
Outdated azd version |
Run winget upgrade Microsoft.Azd (Windows) or brew upgrade azd (macOS). Verify version 1.23.0+. |
| Docker build errors | Docker Desktop not running | Run docker info to verify. Start Docker Desktop if needed. |
SubscriptionNotRegistered |
Resource provider not registered | az provider register --namespace Microsoft.CognitiveServices |
AuthorizationFailed during azd provision |
Missing Contributor role | Request Contributor on your subscription or resource group |
AuthenticationError / DefaultAzureCredential failure |
Stale login | Run az login and azd auth login again |
| Agent not found after deployment | Propagation delay | Wait 2–3 minutes, then re-run az cognitiveservices agent show |
| Container fails to start | Missing env vars or dependency conflict | Run python scripts/get_logs.py; check agent.yaml env vars; rebuild the image |
UnauthorizedAcrPull (403) / InvalidAcrPullCredentials (401) |
Managed identity missing registry role | Grant Container Registry Repository Reader to the project's managed identity on the ACR |
401 Unauthorized |
Missing RBAC role | Grant Cognitive Services OpenAI User to the managed identity on the Azure OpenAI resource |
403 Forbidden |
Hosted agent capability not enabled | Ensure enablePublicHostingEnvironment=true on the capability host |
| Timeout / slow first response | Cold start (no warm replicas) | Increase minReplicas: 1 in azure.yaml; redeploy |
| Port 8088 already in use (local) | Another process | Stop conflicting process; verify with netstat -an | findstr 8088 (Windows) |
| Model not found in catalog | Model unavailable in your region | Edit agent.yaml to use gpt-4.1 or another available model deployment |
Warning: The commands below permanently delete all Azure resources created for this deployment, including the Foundry project, Container Registry, Application Insights, and your hosted agent. This action cannot be undone.
Preview what will be deleted before confirming:
azd down --previewWhen ready, delete everything:
azd downThe cleanup process takes approximately 2–5 minutes. To verify, open the Azure portal, go to your resource group, and confirm the resources no longer appear.
python scripts/deploy_sdk.py --delete# Delete the agent registration
az cognitiveservices agent delete \
--account-name <account> \
--project-name <project> \
--name oncall-copilot
# Delete the ACR image (SDK deploy only)
az acr repository delete \
--name <your-registry> \
--image oncall-copilot:v1 \
--yes- Quickstart: Deploy your first hosted agent (Azure Developer CLI)
- Deploy a hosted agent (how-to guide)
- What are hosted agents?
- Manage hosted agent lifecycle
- Agent development lifecycle
- Python hosted agent samples
- Microsoft Agent Framework documentation
- Model Router overview
- AGENTS.md — Agent architecture and customisation guide
- docs/CONFIGURATION.md — Agent instruction configuration reference