Releases: webup/langgraph-up-react
🔬 v0.2.0: Agent Evaluation System Online!
🎉 Major Release: Comprehensive Agent Evaluation Framework
This release introduces a production-grade evaluation system that sets a new standard for ReAct agent testing and benchmarking.
🔬 Dual Evaluation Framework
Graph Trajectory Evaluation
- LLM-as-Judge methodology with scenario-specific custom rubrics
- Tests agent reasoning patterns and tool usage decisions across multiple scenarios
- Automated scoring and ranking systems for objective performance measurement
Multi-turn Chat Simulation
- Role-persona interaction testing with adversarial scenarios
- Comprehensive conversational capability assessment
- Professional user persona testing including polite and challenging user types
🚀 SiliconFlow Integration
- Complete MaaS Platform Support: Native integration with China's leading open-source model platform
- Multi-Model Benchmarking: Test across Qwen/Qwen3-8B, GLM-4-9B-0414, GLM-Z1-9B-0414 models
- Cost-Effective Evaluation: <10B models provide excellent evaluation capabilities at minimal cost
- Regional API Support: Seamless cn/international endpoint switching
📊 Professional Evaluation Tools
- LangSmith Integration: Complete evaluation tracking with historical analysis
- Structured Reporting: Detailed score extraction and performance analytics
- Trajectory Normalization: JSON serialization-compatible evaluation processing
- Centralized Configuration: Unified evaluation settings via config.py
📚 Comprehensive Documentation
The evaluation system is fully documented in tests/evaluations/README.md with:
- Quick Start Guides: Get running with evaluations in minutes
- Methodology Explanations: Deep dive into LLM-as-Judge approaches
- Configuration References: Complete setup and customization options
- Results Analysis: How to interpret and act on evaluation results
🛠 Enhanced Development Experience
New Make Commands:
make evals # Run complete evaluation suite
make eval_graph # Graph trajectory evaluation
make eval_multiturn # Multi-turn chat evaluation
make eval_graph_qwen # Test specific SiliconFlow models
make eval_graph_glm # GLM model evaluation
Environment Setup:
- New region aliases: cn (China mainland), international (global)
- Added SILICONFLOW_API_KEY for multi-model evaluation
- Enhanced model configuration with provider-specific optimizations
🎯 Production-Ready Features
- Automated CI/CD Integration: Evaluation workflows ready for production pipelines
- Multi-Provider Testing: Compare performance across OpenAI, Anthropic, Qwen, and SiliconFlow
- Security Testing: Adversarial user personas for robust agent validation
- Performance Benchmarking: Quantitative metrics for agent optimization
📈 What's Next
With this evaluation foundation, teams can now:
- Objectively measure agent performance improvements
- Compare different model providers and configurations
- Identify and fix agent reasoning issues before production
- Build confidence in agent reliability through systematic testing
- Full Documentation: Updated README.md, and README_CN.md with comprehensive v0.2.0 features
- Roadmap: v0.2.0 milestone marked complete in ROADMAP.md with detailed achievements
v0.1.2
🛠️ DevOps & Testing Enhancements - v0.1.2
This release focuses on improving development workflows, testing infrastructure, and MCP server capabilities.
✨ New Features
- MCP Server Filtering: Implemented comprehensive MCP server filtering system with advanced testing capabilities
- Claude Code Integration: Added GitHub Workflow support for Claude Code development assistance
- Project Configuration: Enhanced project configuration and gitignore management
🧪 Testing & Quality Assurance
- Comprehensive MCP Testing: Added extensive test coverage for MCP server filtering and tool integration
- Test Infrastructure: Improved testing framework with better coverage and reliability
- Asset Optimization: Continued image optimization for better performance
🚀 Development Experience
- Workflow Automation: Streamlined CI/CD processes with GitHub Actions integration
- Configuration Management: Better project structure and dependency management
- Developer Tools: Enhanced tooling for local development and debugging
🐛 Improvements
- Build System: Refined build processes and dependency resolution
- Documentation: Updated documentation to reflect new features and workflows
This release strengthens the development infrastructure and testing capabilities, making the project more maintainable and reliable for contributors.
v0.1.1
🔧 Refinements & Optimizations - v0.1.1
This release focuses on performance optimizations, documentation improvements, and developer experience enhancements.
🚀 Performance & Configuration
- Model Optimization: Changed default model from qwen-turbo to qwen-flash for better performance and cost efficiency
- Enhanced Context Metadata: Improved context system with LangGraph node bindings for better state tracking
- Asset Optimization: Optimized images and updated studio UI assets
📝 Documentation & Developer Experience
- LangGraph v0.6.6 Documentation: Updated comprehensive documentation for latest LangGraph integration
- Build System Improvements: Fixed Makefile to prefix Python commands with 'uv run' for consistent environment handling
- Code Quality: Removed trailing whitespace and improved code formatting consistency
🐛 Bug Fixes
- Environment Consistency: Fixed command execution in isolated environments
- Configuration Updates: Improved project configuration and dependency management
This release improves the overall developer experience while maintaining backward compatibility and enhancing performance characteristics.
v0.1.0
🚀 Initial Release - LangGraph ReAct Agent v0.1.0
This is the first stable release of the LangGraph ReAct (Reasoning and Action) Agent template, featuring a complete refactoring into a modular architecture.
✨ New Features
- Qwen Model Support: Native integration with Qwen models including regional API support (PRC/International endpoints)
- MCP Integration: Dynamic tool loading with Model Context Protocol (MCP) servers, including DeepWiki for GitHub repository documentation
- Common Module Architecture: Refactored codebase into reusable common module for better maintainability
- Comprehensive Test Suite: Added pytest-based testing framework with full coverage for LangGraph agent functionality
📝 Documentation
- Bilingual Documentation: Complete English and Chinese README with detailed setup instructions
- Project Roadmap: Clear development roadmap and contribution guidelines
- Configuration Guide: Comprehensive environment setup and model configuration documentation
🛠️ Technical Improvements
- Modular Architecture: Clean separation of concerns with common utilities, models, tools, and prompts
- Enhanced Context System: Improved context management with metadata and LangGraph node bindings
- Dynamic Tool Loading: Runtime tool registration with MCP server integration
- Better Error Handling: Robust error handling and debugging capabilities
This release establishes a solid foundation for building sophisticated ReAct agents with LangGraph, supporting multiple model providers and dynamic tool ecosystems.