Skip to content

🤖 Comprehensive testing framework for agricultural AI agents using OpenAI SDK with MCP integration. Batch testing, intent classification, multilingual support, and Excel reporting.

Notifications You must be signed in to change notification settings

eagleisbatman/gap-sdk-testing

Repository files navigation

🤖 GAP SDK Testing - OpenAI

License: MIT Node.js OpenAI MCP

Comprehensive testing framework for agricultural AI agents using OpenAI SDK with Model Context Protocol (MCP) integration. Part of the TomorrowNow Global Access Platform (GAP) ecosystem.


🎯 Overview

This testing framework replicates OpenAI Agent Builder workflows using the OpenAI SDK directly, enabling batch testing, performance analysis, and quality assurance for agricultural AI agents. It integrates with MCP servers (GAP Agriculture, AccuWeather, Decision Trees) to provide real-world testing scenarios.

Key Features

  • OpenAI SDK Integration: Direct use of @openai/agents for Agent Builder compatibility
  • MCP Server Support: Seamless integration with multiple MCP servers (GAP, AccuWeather)
  • Batch Testing: Run hundreds of queries with automated result collection
  • Intent Classification: Automatic intent detection for all queries
  • Multilingual Support: English and Swahili query testing
  • Guardrails: Moderation, jailbreak detection, PII detection
  • Conversation History: Maintains context across multi-turn conversations
  • Excel Export: Detailed results with metrics and analysis

✨ Features

Testing Capabilities

Feature Description
Batch Testing Run 5, 10, 100+ queries per language
Intent Classification Automatic detection of query intent
Language Detection English/Swahili detection and handling
Tool Call Tracking Monitor MCP server call success rates
Guardrail Testing Test moderation and safety filters
Performance Metrics Response time, success rate tracking
Excel Reports Comprehensive test results export

MCP Server Integration

  • GAP Agriculture MCP Server (default)

    • Tool: get_gap_weather_forecast
    • Coverage: Kenya and East Africa
    • Features: 50-member ensemble forecasts, up to 14 days
  • AccuWeather MCP Server (optional)

    • Tools: get_accuweather_weather_forecast, get_accuweather_current_conditions
    • Coverage: Global weather data
    • Features: Current conditions + 5-day forecasts

🚀 Quick Start

Prerequisites

  • Node.js >= 18.0.0
  • OpenAI API key
  • MCP server deployed (GAP or AccuWeather)

Installation

# Clone the repository
git clone https://github.com/eagleisbatman/gap-sdk-testing.git
cd gap-sdk-testing

# Install dependencies
npm install

Configuration

Create a .env file:

# Required: OpenAI API Key
OPENAI_API_KEY=sk-your-openai-api-key-here

# MCP Server Configuration (choose one)
# Option 1: Use GAP MCP Server (default)
MCP_SERVER_TYPE=GAP
GAP_MCP_URL=https://gap-agriculture-mcp-server.up.railway.app/mcp

# Option 2: Use AccuWeather MCP Server
# MCP_SERVER_TYPE=ACCUWEATHER
# ACCUWEATHER_MCP_URL=https://accuweather-mcp-server.up.railway.app/mcp

Running Tests

# Test with 5 queries per language (default)
npm test

# Test with 10 queries per language
npm run test:10

# Test with 100 queries per language
npm run test:100

# Custom number of queries
node batch-test-sdk.js --queries=50

📊 Test Results

Results are automatically saved to test-results/ folder with timestamp:

test-results/gap-sdk-test-results-YYYY-MM-DD-HHMMSS.xlsx

Excel Report Contents

  • Query Details: Original query, detected language, intent classification
  • Response Analysis: Agent response, tool calls made, MCP data retrieved
  • Performance Metrics: Processing time, success/failure status
  • Guardrail Results: Moderation flags, PII detection
  • Summary Statistics: Success rates, tool call rates, language distribution

🔧 Architecture

User Query
    ↓
Intent Classifier (intent-classifier.js)
    ↓
Guardrails (utils/guardrails.js)
    ↓
OpenAI Agent (utils/prompts.js)
    ↓
MCP Tool Call (hostedMcpTool)
    ↓
MCP Server (GAP/AccuWeather)
    ↓
Response Processing
    ↓
Excel Export (utils/excel-utils.js)

Key Components

  • batch-test-sdk.js - Main test orchestration script
  • intent-classifier.js - Intent detection and language classification
  • utils/process-query.js - Individual query processing logic
  • utils/guardrails.js - Safety and moderation checks
  • utils/prompts.js - Agent prompt generation
  • utils/excel-utils.js - Results export and analysis

📈 Usage Examples

Basic Testing

# Run default test (5 queries per language)
npm test

Large-Scale Testing

# Run 100 queries per language (200 total)
npm run test:100

Custom Configuration

// Modify utils/config.js to change:
// - Default coordinates
// - Farmer context
// - MCP server URLs
// - Guardrail settings

🧪 Test Scenarios

Weather Forecast Queries

  • "What's the weather forecast for the next 5 days?"
  • "Will it rain tomorrow?"
  • "What's the temperature going to be?"

Agricultural Advice

  • "Should I plant maize now?"
  • "When should I irrigate my crops?"
  • "Is it a good time to apply fertilizer?"

Multilingual Support

  • English: "What's the weather forecast?"
  • Swahili: "Hali ya hewa itakuaje kesho?"

📊 Performance Metrics

The framework tracks:

  • Success Rate: Percentage of successful queries
  • Tool Call Rate: Percentage of queries that triggered MCP tools
  • Data Retrieval Rate: Percentage of successful MCP data retrievals
  • Average Response Time: Mean processing time per query
  • Intent Classification Accuracy: Correct intent detection rate
  • Language Detection Accuracy: Correct language identification

🔒 Guardrails

Moderation

  • Content moderation via OpenAI Moderation API
  • Automatic blocking of inappropriate content
  • Safe response generation for blocked queries

Jailbreak Detection

  • Detection of prompt injection attempts
  • Protection against system prompt manipulation
  • Safe handling of adversarial inputs

PII Detection

  • Personal information detection
  • Anonymization of sensitive data
  • Privacy-preserving responses

🔌 MCP Server Integration

GAP Agriculture MCP Server

Default Configuration:

MCP_SERVER_TYPE=GAP
GAP_MCP_URL=https://gap-agriculture-mcp-server.up.railway.app/mcp

Tool: get_gap_weather_forecast

Coverage: Kenya and East Africa

AccuWeather MCP Server

Configuration:

MCP_SERVER_TYPE=ACCUWEATHER
ACCUWEATHER_MCP_URL=https://accuweather-mcp-server.up.railway.app/mcp

Tools:

  • get_accuweather_weather_forecast - 5-day forecast
  • get_accuweather_current_conditions - Current weather

Coverage: Global


📝 Query Database

The framework includes a comprehensive query database:

  • English Queries: Weather, planting, irrigation, pest management
  • Swahili Queries: Translated agricultural questions
  • Intent Categories: Weather forecast, agricultural advice, general questions
  • Variety: Different phrasings and complexity levels

🛠️ Development

Project Structure

gap-sdk-testing/
├── batch-test-sdk.js          # Main test script
├── intent-classifier.js       # Intent detection
├── intents.json              # Intent definitions
├── utils/
│   ├── process-query.js      # Query processing
│   ├── guardrails.js         # Safety checks
│   ├── prompts.js            # Agent prompts
│   ├── excel-utils.js        # Excel export
│   └── config.js             # Configuration
└── test-results/             # Generated reports

Adding New Queries

Edit utils/query-loader.js to add new test queries.

Customizing Prompts

Modify prompts/farmerchat-template.md for agent behavior changes.


📊 Comparison with Gemini Version

This OpenAI version is part of a dual-SDK testing approach:

  • OpenAI SDK (gap-sdk-testing) - Uses GPT-4o with Agent Builder
  • Gemini SDK (gap-sdk-testing-gemini) - Uses Gemini 2.5 Flash with structured data

Both versions can run in parallel for comparison testing. See compare-sdks.js in parent directory.


🤝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Ensure all tests pass
  5. Submit a pull request

📄 License

MIT License - see LICENSE file for details.


🔗 Related Projects


📞 Support


Built with ❤️ for the TomorrowNow Global Access Platform

About

🤖 Comprehensive testing framework for agricultural AI agents using OpenAI SDK with MCP integration. Batch testing, intent classification, multilingual support, and Excel reporting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published