🤖 GAP SDK Testing - OpenAI

Comprehensive testing framework for agricultural AI agents using OpenAI SDK with Model Context Protocol (MCP) integration. Part of the TomorrowNow Global Access Platform (GAP) ecosystem.

🎯 Overview

This testing framework replicates OpenAI Agent Builder workflows using the OpenAI SDK directly, enabling batch testing, performance analysis, and quality assurance for agricultural AI agents. It integrates with MCP servers (GAP Agriculture, AccuWeather, Decision Trees) to provide real-world testing scenarios.

Key Features

✅ OpenAI SDK Integration: Direct use of @openai/agents for Agent Builder compatibility
✅ MCP Server Support: Seamless integration with multiple MCP servers (GAP, AccuWeather)
✅ Batch Testing: Run hundreds of queries with automated result collection
✅ Intent Classification: Automatic intent detection for all queries
✅ Multilingual Support: English and Swahili query testing
✅ Guardrails: Moderation, jailbreak detection, PII detection
✅ Conversation History: Maintains context across multi-turn conversations
✅ Excel Export: Detailed results with metrics and analysis

✨ Features

Testing Capabilities

Feature	Description
Batch Testing	Run 5, 10, 100+ queries per language
Intent Classification	Automatic detection of query intent
Language Detection	English/Swahili detection and handling
Tool Call Tracking	Monitor MCP server call success rates
Guardrail Testing	Test moderation and safety filters
Performance Metrics	Response time, success rate tracking
Excel Reports	Comprehensive test results export

MCP Server Integration

GAP Agriculture MCP Server (default)
- Tool: get_gap_weather_forecast
- Coverage: Kenya and East Africa
- Features: 50-member ensemble forecasts, up to 14 days
AccuWeather MCP Server (optional)
- Tools: get_accuweather_weather_forecast, get_accuweather_current_conditions
- Coverage: Global weather data
- Features: Current conditions + 5-day forecasts

🚀 Quick Start

Prerequisites

Node.js >= 18.0.0
OpenAI API key
MCP server deployed (GAP or AccuWeather)

Installation

# Clone the repository
git clone https://github.com/eagleisbatman/gap-sdk-testing.git
cd gap-sdk-testing

# Install dependencies
npm install

Configuration

Create a .env file:

# Required: OpenAI API Key
OPENAI_API_KEY=sk-your-openai-api-key-here

# MCP Server Configuration (choose one)
# Option 1: Use GAP MCP Server (default)
MCP_SERVER_TYPE=GAP
GAP_MCP_URL=https://gap-agriculture-mcp-server.up.railway.app/mcp

# Option 2: Use AccuWeather MCP Server
# MCP_SERVER_TYPE=ACCUWEATHER
# ACCUWEATHER_MCP_URL=https://accuweather-mcp-server.up.railway.app/mcp

Running Tests

# Test with 5 queries per language (default)
npm test

# Test with 10 queries per language
npm run test:10

# Test with 100 queries per language
npm run test:100

# Custom number of queries
node batch-test-sdk.js --queries=50

📊 Test Results

Results are automatically saved to test-results/ folder with timestamp:

test-results/gap-sdk-test-results-YYYY-MM-DD-HHMMSS.xlsx

Excel Report Contents

Query Details: Original query, detected language, intent classification
Response Analysis: Agent response, tool calls made, MCP data retrieved
Performance Metrics: Processing time, success/failure status
Guardrail Results: Moderation flags, PII detection
Summary Statistics: Success rates, tool call rates, language distribution

🔧 Architecture

User Query
    ↓
Intent Classifier (intent-classifier.js)
    ↓
Guardrails (utils/guardrails.js)
    ↓
OpenAI Agent (utils/prompts.js)
    ↓
MCP Tool Call (hostedMcpTool)
    ↓
MCP Server (GAP/AccuWeather)
    ↓
Response Processing
    ↓
Excel Export (utils/excel-utils.js)

Key Components

batch-test-sdk.js - Main test orchestration script
intent-classifier.js - Intent detection and language classification
utils/process-query.js - Individual query processing logic
utils/guardrails.js - Safety and moderation checks
utils/prompts.js - Agent prompt generation
utils/excel-utils.js - Results export and analysis

📈 Usage Examples

Basic Testing

# Run default test (5 queries per language)
npm test

Large-Scale Testing

# Run 100 queries per language (200 total)
npm run test:100

Custom Configuration

// Modify utils/config.js to change:
// - Default coordinates
// - Farmer context
// - MCP server URLs
// - Guardrail settings

🧪 Test Scenarios

Weather Forecast Queries

"What's the weather forecast for the next 5 days?"
"Will it rain tomorrow?"
"What's the temperature going to be?"

Agricultural Advice

"Should I plant maize now?"
"When should I irrigate my crops?"
"Is it a good time to apply fertilizer?"

Multilingual Support

English: "What's the weather forecast?"
Swahili: "Hali ya hewa itakuaje kesho?"

📊 Performance Metrics

The framework tracks:

Success Rate: Percentage of successful queries
Tool Call Rate: Percentage of queries that triggered MCP tools
Data Retrieval Rate: Percentage of successful MCP data retrievals
Average Response Time: Mean processing time per query
Intent Classification Accuracy: Correct intent detection rate
Language Detection Accuracy: Correct language identification

🔒 Guardrails

Moderation

Content moderation via OpenAI Moderation API
Automatic blocking of inappropriate content
Safe response generation for blocked queries

Jailbreak Detection

Detection of prompt injection attempts
Protection against system prompt manipulation
Safe handling of adversarial inputs

PII Detection

Personal information detection
Anonymization of sensitive data
Privacy-preserving responses

🔌 MCP Server Integration

GAP Agriculture MCP Server

Default Configuration:

MCP_SERVER_TYPE=GAP
GAP_MCP_URL=https://gap-agriculture-mcp-server.up.railway.app/mcp

Tool: get_gap_weather_forecast

Coverage: Kenya and East Africa

AccuWeather MCP Server

Configuration:

MCP_SERVER_TYPE=ACCUWEATHER
ACCUWEATHER_MCP_URL=https://accuweather-mcp-server.up.railway.app/mcp

Tools:

get_accuweather_weather_forecast - 5-day forecast
get_accuweather_current_conditions - Current weather

Coverage: Global

📝 Query Database

The framework includes a comprehensive query database:

English Queries: Weather, planting, irrigation, pest management
Swahili Queries: Translated agricultural questions
Intent Categories: Weather forecast, agricultural advice, general questions
Variety: Different phrasings and complexity levels

🛠️ Development

Project Structure

gap-sdk-testing/
├── batch-test-sdk.js          # Main test script
├── intent-classifier.js       # Intent detection
├── intents.json              # Intent definitions
├── utils/
│   ├── process-query.js      # Query processing
│   ├── guardrails.js         # Safety checks
│   ├── prompts.js            # Agent prompts
│   ├── excel-utils.js        # Excel export
│   └── config.js             # Configuration
└── test-results/             # Generated reports

Adding New Queries

Edit utils/query-loader.js to add new test queries.

Customizing Prompts

Modify prompts/farmerchat-template.md for agent behavior changes.

📊 Comparison with Gemini Version

This OpenAI version is part of a dual-SDK testing approach:

OpenAI SDK (gap-sdk-testing) - Uses GPT-4o with Agent Builder
Gemini SDK (gap-sdk-testing-gemini) - Uses Gemini 2.5 Flash with structured data

Both versions can run in parallel for comparison testing. See compare-sdks.js in parent directory.

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Ensure all tests pass
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

🔗 Related Projects

GAP Agriculture MCP Server - Weather forecast MCP server
GAP SDK Testing (Gemini) - Google Gemini SDK version
TomorrowNow Decision Tree MCP Server - Crop advisory MCP server
TomorrowNow GAP Platform - Global Access Platform

📞 Support

Issues: GitHub Issues
Documentation: See code comments and inline documentation
MCP Protocol: Model Context Protocol

Built with ❤️ for the TomorrowNow Global Access Platform

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
prompts		prompts
utils		utils
.gitignore		.gitignore
README.md		README.md
analyze-responses.js		analyze-responses.js
batch-test-sdk.js		batch-test-sdk.js
cleanup-excel.js		cleanup-excel.js
cleanup-excel.sh		cleanup-excel.sh
intent-classifier.js		intent-classifier.js
intents.json		intents.json
multilingual-prompts.json		multilingual-prompts.json
package-lock.json		package-lock.json
package.json		package.json

eagleisbatman/gap-sdk-testing

Folders and files

Latest commit

History

Repository files navigation