- Kiro IDE installed and configured
- AWS CLI configured with appropriate permissions
- Opik API Key configured
- AgentCore deployment completed
- Copy the MCP configuration to Kiro:
cp kiro-integration/mcp.json ~/.kiro/settings/-
Restart Kiro IDE to load the MCP server
-
Verify MCP server is loaded in Kiro's MCP panel
In Kiro chat, run:
Evaluate my customer service agent with accuracy and relevance metrics using these test cases:
Test Case 1:
- Input: "I need help with my order #12345. It hasn't arrived yet."
- Expected: "I understand your concern about order #12345. Let me check the status for you."
- Context: Customer service scenario - order inquiry
Test Case 2:
- Input: "Can you help me return a defective product?"
- Expected: "I'd be happy to help you with the return process."
- Context: Customer service scenario - product return
In Kiro chat, run:
Evaluate this Agent2Agent workflow with conversation quality and coordination metrics:
Agents: customer-intake-agent, technical-support-agent, escalation-manager-agent
Conversation:
1. customer-intake-agent → technical-support-agent: "Customer reports login issues with premium account PREM-12345"
2. technical-support-agent → customer-intake-agent: "Server outage identified, ETA 30 minutes"
3. customer-intake-agent → escalation-manager-agent: "Customer requesting compensation for 2+ hour outage"
# View AgentCore runtime logs
aws logs tail /aws/bedrock-agentcore/runtimes/opik_mcp_server-SKTEQX3Omg-DEFAULT --follow
# View evaluation logs
aws logs tail /aws/bedrock-agentcore/opik-evaluations --follow- Open AWS Console → CloudWatch → Metrics
- Navigate to "OpikMCP" namespace
- View metrics:
OpikMCP/Evaluations/EvaluationScoreOpikMCP/Evaluations/EvaluationCountOpikMCP/BatchEvaluations/BatchSuccessRate
- Open AWS Console → X-Ray → Traces
- Filter by service:
opik-mcp-server - View trace details for evaluation requests
Visit: https://console.aws.amazon.com/cloudwatch/home?region=eu-west-1#gen-ai-observability/agent-core
- Login to Opik: https://www.comet.com/opik
- Navigate to your workspace
- Check projects:
ai-agents-evaluationcustomer-service-agent-test
- View trace details, metrics, and evaluation results
- Accuracy: 15-25% (indicates need for improvement)
- Relevance: 90-100% (good contextual understanding)
- Helpfulness: 50-70% (moderate helpfulness)
- Conversation Quality: 90-100% (excellent communication)
- Agent Coordination: 90-100% (proper handoffs)
- Workflow Efficiency: 60-80% (room for optimization)
- Check IAM permissions for the execution role
- Verify log group exists:
/aws/bedrock-agentcore/opik-evaluations - Check AgentCore deployment status
- Verify Opik API key is correct
- Check network connectivity to Opik platform
- Review evaluation logs for API errors
- Verify MCP configuration in
~/.kiro/settings/mcp.json - Restart Kiro IDE
- Check MCP server status in Kiro's MCP panel
Run batch evaluation on multiple agents:
- Agent 1: customer-service-agent
- Agent 2: technical-support-agent
- Agent 3: sales-agent
Use comprehensive metrics: accuracy, relevance, helpfulness, coherence
Evaluate agent performance under load:
- 50 concurrent evaluations
- Mixed single and multi-agent scenarios
- Monitor CloudWatch metrics for performance
- Evaluation failure rate > 10%
- Average evaluation time > 5 seconds
- Memory usage > 80%
- Agent accuracy trends
- Multi-agent coordination scores
- Evaluation throughput
This testing approach ensures comprehensive validation of both the MCP server functionality and the observability integration across AWS and Opik platforms.