Comprehensive testing framework for agricultural AI agents using OpenAI SDK with Model Context Protocol (MCP) integration. Part of the TomorrowNow Global Access Platform (GAP) ecosystem.
This testing framework replicates OpenAI Agent Builder workflows using the OpenAI SDK directly, enabling batch testing, performance analysis, and quality assurance for agricultural AI agents. It integrates with MCP servers (GAP Agriculture, AccuWeather, Decision Trees) to provide real-world testing scenarios.
- ✅ OpenAI SDK Integration: Direct use of
@openai/agentsfor Agent Builder compatibility - ✅ MCP Server Support: Seamless integration with multiple MCP servers (GAP, AccuWeather)
- ✅ Batch Testing: Run hundreds of queries with automated result collection
- ✅ Intent Classification: Automatic intent detection for all queries
- ✅ Multilingual Support: English and Swahili query testing
- ✅ Guardrails: Moderation, jailbreak detection, PII detection
- ✅ Conversation History: Maintains context across multi-turn conversations
- ✅ Excel Export: Detailed results with metrics and analysis
| Feature | Description |
|---|---|
| Batch Testing | Run 5, 10, 100+ queries per language |
| Intent Classification | Automatic detection of query intent |
| Language Detection | English/Swahili detection and handling |
| Tool Call Tracking | Monitor MCP server call success rates |
| Guardrail Testing | Test moderation and safety filters |
| Performance Metrics | Response time, success rate tracking |
| Excel Reports | Comprehensive test results export |
-
GAP Agriculture MCP Server (default)
- Tool:
get_gap_weather_forecast - Coverage: Kenya and East Africa
- Features: 50-member ensemble forecasts, up to 14 days
- Tool:
-
AccuWeather MCP Server (optional)
- Tools:
get_accuweather_weather_forecast,get_accuweather_current_conditions - Coverage: Global weather data
- Features: Current conditions + 5-day forecasts
- Tools:
- Node.js >= 18.0.0
- OpenAI API key
- MCP server deployed (GAP or AccuWeather)
# Clone the repository
git clone https://github.com/eagleisbatman/gap-sdk-testing.git
cd gap-sdk-testing
# Install dependencies
npm installCreate a .env file:
# Required: OpenAI API Key
OPENAI_API_KEY=sk-your-openai-api-key-here
# MCP Server Configuration (choose one)
# Option 1: Use GAP MCP Server (default)
MCP_SERVER_TYPE=GAP
GAP_MCP_URL=https://gap-agriculture-mcp-server.up.railway.app/mcp
# Option 2: Use AccuWeather MCP Server
# MCP_SERVER_TYPE=ACCUWEATHER
# ACCUWEATHER_MCP_URL=https://accuweather-mcp-server.up.railway.app/mcp# Test with 5 queries per language (default)
npm test
# Test with 10 queries per language
npm run test:10
# Test with 100 queries per language
npm run test:100
# Custom number of queries
node batch-test-sdk.js --queries=50Results are automatically saved to test-results/ folder with timestamp:
test-results/gap-sdk-test-results-YYYY-MM-DD-HHMMSS.xlsx
- Query Details: Original query, detected language, intent classification
- Response Analysis: Agent response, tool calls made, MCP data retrieved
- Performance Metrics: Processing time, success/failure status
- Guardrail Results: Moderation flags, PII detection
- Summary Statistics: Success rates, tool call rates, language distribution
User Query
↓
Intent Classifier (intent-classifier.js)
↓
Guardrails (utils/guardrails.js)
↓
OpenAI Agent (utils/prompts.js)
↓
MCP Tool Call (hostedMcpTool)
↓
MCP Server (GAP/AccuWeather)
↓
Response Processing
↓
Excel Export (utils/excel-utils.js)
batch-test-sdk.js- Main test orchestration scriptintent-classifier.js- Intent detection and language classificationutils/process-query.js- Individual query processing logicutils/guardrails.js- Safety and moderation checksutils/prompts.js- Agent prompt generationutils/excel-utils.js- Results export and analysis
# Run default test (5 queries per language)
npm test# Run 100 queries per language (200 total)
npm run test:100// Modify utils/config.js to change:
// - Default coordinates
// - Farmer context
// - MCP server URLs
// - Guardrail settings- "What's the weather forecast for the next 5 days?"
- "Will it rain tomorrow?"
- "What's the temperature going to be?"
- "Should I plant maize now?"
- "When should I irrigate my crops?"
- "Is it a good time to apply fertilizer?"
- English: "What's the weather forecast?"
- Swahili: "Hali ya hewa itakuaje kesho?"
The framework tracks:
- Success Rate: Percentage of successful queries
- Tool Call Rate: Percentage of queries that triggered MCP tools
- Data Retrieval Rate: Percentage of successful MCP data retrievals
- Average Response Time: Mean processing time per query
- Intent Classification Accuracy: Correct intent detection rate
- Language Detection Accuracy: Correct language identification
- Content moderation via OpenAI Moderation API
- Automatic blocking of inappropriate content
- Safe response generation for blocked queries
- Detection of prompt injection attempts
- Protection against system prompt manipulation
- Safe handling of adversarial inputs
- Personal information detection
- Anonymization of sensitive data
- Privacy-preserving responses
Default Configuration:
MCP_SERVER_TYPE=GAP
GAP_MCP_URL=https://gap-agriculture-mcp-server.up.railway.app/mcpTool: get_gap_weather_forecast
Coverage: Kenya and East Africa
Configuration:
MCP_SERVER_TYPE=ACCUWEATHER
ACCUWEATHER_MCP_URL=https://accuweather-mcp-server.up.railway.app/mcpTools:
get_accuweather_weather_forecast- 5-day forecastget_accuweather_current_conditions- Current weather
Coverage: Global
The framework includes a comprehensive query database:
- English Queries: Weather, planting, irrigation, pest management
- Swahili Queries: Translated agricultural questions
- Intent Categories: Weather forecast, agricultural advice, general questions
- Variety: Different phrasings and complexity levels
gap-sdk-testing/
├── batch-test-sdk.js # Main test script
├── intent-classifier.js # Intent detection
├── intents.json # Intent definitions
├── utils/
│ ├── process-query.js # Query processing
│ ├── guardrails.js # Safety checks
│ ├── prompts.js # Agent prompts
│ ├── excel-utils.js # Excel export
│ └── config.js # Configuration
└── test-results/ # Generated reports
Edit utils/query-loader.js to add new test queries.
Modify prompts/farmerchat-template.md for agent behavior changes.
This OpenAI version is part of a dual-SDK testing approach:
- OpenAI SDK (
gap-sdk-testing) - Uses GPT-4o with Agent Builder - Gemini SDK (
gap-sdk-testing-gemini) - Uses Gemini 2.5 Flash with structured data
Both versions can run in parallel for comparison testing. See compare-sdks.js in parent directory.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details.
- GAP Agriculture MCP Server - Weather forecast MCP server
- GAP SDK Testing (Gemini) - Google Gemini SDK version
- TomorrowNow Decision Tree MCP Server - Crop advisory MCP server
- TomorrowNow GAP Platform - Global Access Platform
- Issues: GitHub Issues
- Documentation: See code comments and inline documentation
- MCP Protocol: Model Context Protocol
Built with ❤️ for the TomorrowNow Global Access Platform