Skip to content

Pet Clinic Agents are recommending invalid nutritional products. #222

@liustve

Description

@liustve

Hi @awsapm could you look into this?

CRITICAL FINDINGS

1. Service Integration Failure

Root Cause: The nutrition agent (nutrition_agent.DEFAULT) is experiencing 17,824 errors over 24 hours where it cannot find information for specific pet types
Error Pattern: "Nutrition service could not find information for pet: [rabbit/guinea pig/etc.]"
Impact: When the nutrition service fails, the AI agent fabricates product recommendations instead of gracefully handling the error

2. Data Inconsistency Issue

What's Happening: The nutrition service returns legitimate products like "PurrfectChoice Premium Feline, WhiskerWell Grain-Free Delight, MeowMaster Senior Formula"
The Problem: The AI agent is inventing additional products like:
• "RabbitRich Premium Alfalfa Hay"
• "FeatherFeast Young Bunny Alfalfa Blend"
• "PurrfectChoice Premium Feline Kitten Formula" (variant not in database)

3. Service Architecture Problems

Nutrition Service: Running on nutrition-service-nodejs in EKS, connected to MongoDB
AI Agent: Running on Bedrock Agent Core, calling nutrition service tools
Failure Mode: When nutrition service returns errors, the AI agent hallucinates products instead of saying "product not available"

OPERATIONAL RECOMMENDATIONS

Immediate Actions (Priority 1)

  1. Fix Error Handling in AI Agent
    • Update the agent's system prompt to explicitly handle tool failures
    • Configure agent to respond with "Please consult our veterinarian for specific recommendations" when nutrition service fails
    • Remove ability to fabricate product names

  2. Database Validation
    • Audit the MongoDB database in nutrition-service-nodejs
    • Ensure all pet types (rabbit, guinea pig, etc.) have proper nutrition data
    • Add missing pet nutrition information

  3. Service Monitoring Enhancement
    • Set up CloudWatch alarms for nutrition service errors (currently 700+ errors/hour)
    • Monitor the GET /nutrition/:pet_type endpoint specifically
    • Alert when error rate exceeds 5%

Medium-term Fixes (Priority 2)

  1. Improve Service Reliability
    • The nutrition service shows latency spikes (up to 217ms for database queries)
    • Consider implementing caching for frequently requested pet nutrition data
    • Add retry logic for database connection failures

  2. Data Governance
    • Implement product catalog validation
    • Create approved product list that the AI agent can reference
    • Add product availability checks before recommendations

  3. Enhanced Logging
    • Add structured logging to track which products are being recommended
    • Log when the AI agent falls back to generic recommendations
    • Monitor recommendation accuracy

Long-term Improvements (Priority 3)

  1. Service Architecture
    • Consider implementing a product catalog service separate from nutrition service
    • Add circuit breaker pattern for external service calls
    • Implement graceful degradation when services are unavailable

  2. Quality Assurance
    • Implement automated testing for AI agent responses
    • Add validation rules for product recommendations
    • Create approval workflow for new product additions

Monitoring Setup

I recommend setting up these CloudWatch alarms:
• Nutrition service error rate > 5%
• AI agent tool failure rate > 10%
• Database query latency > 500ms
• Invalid product recommendation detection

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions