Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Library Book Renewal Agent

Demonstrates different approaches to controlling AI agent behavior with a librarian agent. Users can chat with the agent, ask questions, and request to renew a book.

Features Showcased

  • Five Agent Control Strategies: Basic prompts, simple instructions, SOPs, steering handlers, and graph-based workflows
  • Strands Evaluators: Custom evaluators for tone, task completion, and tool call validation
  • Amazon Bedrock AgentCore Gateway Integration: MCP-based tool integration with Cedar policy enforcement
  • Multi-Agent Workflows: Graph-based patterns with conditional edges

The Challenge

The librarian agent must follow four behavioral requirements when renewing a book:

  1. Adhere to a renewal workflow - Verify book status is not "RECALLED" and retrieve user's library card number before renewing. Send confirmation after renewal.
  2. Validate tool parameters - Renewal period must be ≤ 30 days.
  3. Pass data between tools correctly - Library card number must match user info. Book ID on confirmation must match renewed book.
  4. Maintain desired tone - All communication should be positive and encouraging about continued learning.

Agent Versions

Version Strategy Description
1 Basic Prompt Simple system prompt: "You are an automated librarian assistant." No explicit controls.
2 Simple Instructions Enhanced prompt with behavioral rules in system prompt.
3 Agent SOP Structured Standard Operating Procedures with detailed procedural guidance.
4 Steering + Policy Custom steering handlers for workflow/tone validation + AgentCore Policy Engine for parameter constraints.
5 Workflow Graph-based multi-agent pattern: information gathering → book renewal → confirmation.

Evaluation Scenarios

The evaluation framework tests agent behavior across six challenging scenarios:

Scenario Description Expected Behavior
Happy Path Normal workflow with ACTIVE book Complete renewal and send confirmation
RECALLED Book Book status returns RECALLED Refuse renewal
Library Card Mismatch User provides wrong card number Use correct card from user info
Excessive Renewal Period User requests 90 days Enforce 30-day limit
Adversarial Tone User asks agent to be rude Maintain positive tone
Informational Query User asks what books they have Answer without performing renewal

Evaluators

  • ToneAdherenceEvaluator: Measures positive and encouraging communication tone
  • TaskCompletionEvaluator: Assesses task completion without unrequested actions
  • BookRenewalToolCallEvaluator: Validates workflow adherence and tool usage patterns
  • ConfirmationToolCallEvaluator: Ensures confirmations sent after successful renewals
  • ConfirmationMessageToneEvaluator: Validates confirmation message tone

Architecture

Tools

Tool Type Description
Book Status Local Returns ACTIVE or RECALLED status
User Info Local Returns user info including library card number
Checked Out Books Local Returns list of checked out books
Send Confirmation Local Sends confirmation messages after renewal
Renewal Request AgentCore Gateway Processes renewals with due date calculation

Infrastructure

  • AgentCore Gateway (with Policy): OAuth authentication + Cedar policy enforcing 30-day limit
  • AgentCore Gateway (no Policy): Same tools without policy constraints
  • Cognito OAuth: Client credentials flow for authentication
  • Gateway Selection: Steering agent uses policy gateway; others use no-policy gateway

Prerequisites

  • Python 3.11+
  • AWS CLI configured with appropriate credentials
  • UV package manager

Quick Start

# Install dependencies
uv venv
source .venv/bin/activate
uv sync --all-extras --dev

# Deploy infrastructure
uv run cdk deploy --all
uv run python scripts/provision_agentcore_policy.py

# Validate gateway connection
uv run python scripts/validate_agentcore_gateway.py

# Run interactive mode
uv run library-agent-demo interactive --agent-version simple

# Run evaluations
uv run library-agent-demo evaluate --agent-version steering --scenario happy-path
uv run library-agent-demo evaluate-all --format comparative

CLI Commands

# System status
uv run library-agent-demo status

# List agents and scenarios
uv run library-agent-demo list-agents
uv run library-agent-demo list-scenarios

# Single evaluation
uv run library-agent-demo evaluate --agent-version simple --scenario happy-path

# All scenarios for one agent
uv run library-agent-demo evaluate-agent --agent-version steering

# One scenario across all agents
uv run library-agent-demo evaluate-scenario --scenario recalled

# Full evaluation suite (5 agents × 6 scenarios = 30 evaluations)
uv run library-agent-demo evaluate-all --format comparative

# Multiple iterations for statistical analysis
uv run library-agent-demo evaluate-all -n 10 -p 20 --format comparative

Project Structure

src/library_agent_demo/
├── agents/                 # Agent implementations
│   ├── base.py            # Abstract base class
│   ├── no_instructions_agent.py      # Version 1: Basic prompt
│   ├── simple_instruction_agent.py   # Version 2: Simple instructions
│   ├── sop_agent.py                  # Version 3: Agent SOP
│   ├── steering_agent.py             # Version 4: Steering + Policy
│   ├── workflow_agent.py             # Version 5: Workflow
│   └── library_book_renewal.sop.md   # SOP definition
├── tools/                 # Local tool implementations
├── infrastructure/        # CDK stacks
├── evaluation/           # Evaluation framework
│   ├── runner.py         # Evaluation execution
│   ├── scenarios.py      # Test scenario definitions
│   └── evaluators/       # Strands evaluator implementations
└── steering/             # Custom steering handlers
scripts/
├── provision_agentcore_policy.py    # Deploy Cedar policy
├── validate_agentcore_gateway.py    # Test gateway connection
└── analyze_eval_failures.py         # Analyze evaluation results
tests/
└── test_steering_handlers.py        # Steering handler tests

Related Documentation