Skip to content

Commit a439a93

Browse files
committed
📚 docs: comprehensive v0.2.0 documentation update
- Update README.md and README_CN.md with v0.2.0 evaluation system documentation - Add SiliconFlow integration details and API key setup instructions - Update ROADMAP.md to mark v0.2.0 milestone as completed - Add .claude directory to .gitignore for Claude Code integration
1 parent a5c7e65 commit a439a93

File tree

4 files changed

+311
-19
lines changed

4 files changed

+311
-19
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,3 +166,7 @@ cython_debug/
166166

167167
# macOS
168168
.DS_Store
169+
170+
# Claude Code
171+
.claude
172+
.claude/backups/

README.md

Lines changed: 146 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LangGraph ReAct Agent Template
22

3-
[![Version](https://img.shields.io/badge/version-v0.1.0-blue.svg)](https://github.com/webup/langgraph-up-react)
3+
[![Version](https://img.shields.io/badge/version-v0.2.0-blue.svg)](https://github.com/webup/langgraph-up-react)
44
[![LangGraph](https://img.shields.io/badge/LangGraph-v0.6.6-blue.svg)](https://github.com/langchain-ai/langgraph)
55
[![Build](https://github.com/webup/langgraph-up-react/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/webup/langgraph-up-react/actions/workflows/unit-tests.yml)
66
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)
@@ -10,6 +10,8 @@
1010

1111
This template showcases a [ReAct agent](https://arxiv.org/abs/2210.03629) implemented using [LangGraph](https://github.com/langchain-ai/langgraph), works seamlessly with [LangGraph Studio](https://docs.langchain.com/langgraph-platform/quick-start-studio#use-langgraph-studio). ReAct agents are uncomplicated, prototypical agents that can be flexibly extended to many tools.
1212

13+
**🎉 Latest v0.2.0 Release**: Complete evaluation system and multi-model support! Check the [release notes](https://github.com/webup/langgraph-up-react/releases) for all new features.
14+
1315
![Graph view in LangGraph studio UI](./static/studio_ui.png)
1416

1517
The core logic, defined in `src/react_agent/graph.py`, demonstrates a flexible ReAct agent that iteratively reasons about user queries and executes actions. The template features a modular architecture with shared components in `src/common/`, MCP integration for external documentation sources, and comprehensive testing suite.
@@ -19,11 +21,18 @@ The core logic, defined in `src/react_agent/graph.py`, demonstrates a flexible R
1921
## Features
2022

2123
### Multi-Provider Model Support
24+
- **SiliconFlow Integration**: Complete support for Chinese MaaS platform with open-source models (Qwen, GLM, DeepSeek, etc.)
2225
- **Qwen Models**: Complete Qwen series support via `langchain-qwq` package, including Qwen-Plus, Qwen-Turbo, QwQ-32B, QvQ-72B
2326
- **OpenAI**: GPT-4o, GPT-4o-mini, etc.
2427
- **OpenAI-Compatible**: Any provider supporting OpenAI API format via custom API key and base URL
2528
- **Anthropic**: Claude 4 Sonnet, Claude 3.5 Haiku, etc.
2629

30+
### Production-Grade Agent Evaluation System
31+
- **Dual Evaluation Framework**: Graph trajectory evaluation + Multi-turn chat simulation for comprehensive agent testing
32+
- **LLM-as-Judge Methodology**: Scenario-specific evaluation criteria with professional assessment systems
33+
- **Multi-Model Benchmarking**: Compare performance across different model providers and configurations
34+
- **LangSmith Integration**: Complete evaluation tracking with historical analysis and collaboration features
35+
2736
### Agent Tool Integration Ecosystem
2837
- **Model Context Protocol (MCP)**: Dynamic external tool loading at runtime
2938
- **DeepWiki MCP Server**: Optional MCP tools for GitHub repository documentation access and Q&A capabilities
@@ -100,6 +109,9 @@ TAVILY_API_KEY=your-tavily-api-key
100109
# Required: If using Qwen models (default)
101110
DASHSCOPE_API_KEY=your-dashscope-api-key
102111

112+
# Recommended: SiliconFlow platform for multi-model support and evaluation
113+
SILICONFLOW_API_KEY=your-siliconflow-api-key
114+
103115
# Optional: OpenAI model service platform keys
104116
OPENAI_API_KEY=your-openai-api-key
105117
# Optional: If using OpenAI-compatible service platforms
@@ -109,7 +121,7 @@ OPENAI_API_BASE=your-openai-base-url
109121
ANTHROPIC_API_KEY=your-anthropic-api-key
110122

111123
# Optional: Regional API support for Qwen models
112-
REGION=international # or 'prc' for China mainland (default)
124+
REGION=international # or 'cn' for China mainland (default)
113125

114126
# Optional: Always enable DeepWiki documentation tools
115127
ENABLE_DEEPWIKI=true
@@ -127,6 +139,12 @@ The template uses `qwen:qwen-flash` as the default model, defined in [`src/commo
127139

128140
### API Key Setup by Provider
129141

142+
#### SiliconFlow (Recommended for Evaluation)
143+
```bash
144+
SILICONFLOW_API_KEY=your-siliconflow-api-key
145+
```
146+
Get your API key: [SiliconFlow Console](https://siliconflow.com) - Supports Qwen, GLM, DeepSeek and other open-source models
147+
130148
#### OpenAI
131149
```bash
132150
OPENAI_API_KEY=your-openai-api-key
@@ -142,7 +160,7 @@ Get your API key: [Anthropic Console](https://console.anthropic.com/)
142160
#### Qwen Models (Default)
143161
```bash
144162
DASHSCOPE_API_KEY=your-dashscope-api-key
145-
REGION=international # or 'prc' for China mainland
163+
REGION=international # or 'cn' for China mainland
146164
```
147165
Get your API key: [DashScope Console](https://dashscope.console.aliyun.com/)
148166

@@ -239,6 +257,11 @@ In LangGraph Studio, configure models through [Assistant management](https://doc
239257
"openai:gpt-4o-mini"
240258
"openai:gpt-4o"
241259

260+
# SiliconFlow models (Chinese MaaS platform)
261+
"siliconflow:Qwen/Qwen3-8B" # Qwen series efficient model
262+
"siliconflow:THUDM/GLM-4-9B-0414" # GLM series chat model
263+
"siliconflow:THUDM/GLM-Z1-9B-0414" # GLM reasoning-enhanced model
264+
242265
# Qwen models (with regional support)
243266
"qwen:qwen-flash" # Default model
244267
"qwen:qwen-plus" # Balanced performance
@@ -311,6 +334,102 @@ Key components:
311334

312335
This structure supports multiple agents and easy component reuse across different implementations.
313336

337+
## 🔬 Agent Evaluation System
338+
339+
### Why Evaluation Matters
340+
341+
Agent evaluation is crucial for production-grade AI applications because it:
342+
343+
- **🎯 Validates Performance**: Ensures agents work correctly across different scenarios and use tools appropriately
344+
- **🛡️ Identifies Security Issues**: Discovers potential vulnerabilities through adversarial testing
345+
- **📊 Enables Benchmarking**: Provides objective metrics to compare different models and configurations
346+
- **🔄 Drives Improvement**: Offers concrete performance metrics to guide agent optimization
347+
348+
### Dual Evaluation Framework
349+
350+
The template provides a comprehensive evaluation system using a dual-methodology approach:
351+
352+
#### 🎯 Graph Trajectory Evaluation
353+
Tests agent reasoning patterns and tool usage decisions:
354+
355+
```bash
356+
# Run comprehensive graph trajectory evaluation
357+
make eval_graph
358+
359+
# Test specific models
360+
make eval_graph_qwen # Qwen/Qwen3-8B model
361+
make eval_graph_glm # GLM-4-9B-0414 model
362+
```
363+
364+
**Evaluation Scenarios**:
365+
- **Simple Question**: "What is the capital of France?" - Tests efficiency for basic facts
366+
- **Search Required**: "What's the latest news about artificial intelligence?" - Tests tool usage and information synthesis
367+
- **Multi-step Reasoning**: "What are the pros and cons of renewable energy, and what are the latest developments?" - Tests complex analytical tasks
368+
369+
#### 🔄 Multi-turn Chat Simulation
370+
Tests conversational capabilities through role-persona interactions:
371+
372+
```bash
373+
# Start development server (required for multi-turn evaluation)
374+
make dev
375+
376+
# Run multi-turn evaluation in another terminal
377+
make eval_multiturn
378+
379+
# Test specific user personas
380+
make eval_multiturn_polite # Polite user persona
381+
make eval_multiturn_hacker # Adversarial user persona
382+
```
383+
384+
**Role Scenarios**:
385+
- **Writing Assistant** × User Personas: Professional email collaboration
386+
- **Customer Service** × User Personas: Account troubleshooting support
387+
- **Interviewer** × User Personas: Technical interview management
388+
389+
### Multi-Provider Model Testing
390+
391+
The evaluation system supports testing across different model providers:
392+
393+
- **🌍 International Models**: OpenAI GPT-4o, Anthropic Claude, etc.
394+
- **🇨🇳 Chinese Models**: SiliconFlow platform (Qwen, GLM, DeepSeek models)
395+
- **🔄 Comparative Analysis**: Side-by-side performance comparison across providers
396+
- **💡 Cost Optimization**: Identify the most cost-effective models for your use cases
397+
398+
### Evaluation System Details
399+
400+
The evaluation system provides a comprehensive agent performance analysis framework with detailed test scenarios, evaluation methodology, and results analysis.
401+
402+
For specific evaluation results, test scenarios, and usage instructions, please refer to the detailed evaluation system documentation.
403+
404+
### Quick Start with Evaluation
405+
406+
```bash
407+
# Set up required environment variables
408+
export SILICONFLOW_API_KEY="your_siliconflow_api_key" # For model testing
409+
export TAVILY_API_KEY="your_tavily_api_key" # For search functionality
410+
export LANGSMITH_API_KEY="your_langsmith_api_key" # For evaluation tracking
411+
412+
# Run comprehensive evaluation suite
413+
make evals
414+
415+
# Or run evaluations separately
416+
make eval_graph # Graph trajectory evaluation (runs independently)
417+
make eval_multiturn # Multi-turn chat evaluation (requires server)
418+
419+
# View release notes and version information
420+
# Visit GitHub Releases page for all version release notes: https://github.com/webup/langgraph-up-react/releases
421+
```
422+
423+
### Evaluation System Features
424+
425+
- **🎯 LLM-as-Judge Methodology**: Scenario-specific custom evaluation criteria
426+
- **📊 Professional Reporting**: Detailed score extraction and ranking systems
427+
- **🔍 Trajectory Normalization**: JSON serialization-compatible trajectory processing
428+
- **📈 LangSmith Integration**: Complete tracking and historical analysis
429+
- **⚙️ Centralized Configuration**: Unified evaluation settings in `config.py`
430+
431+
For detailed evaluation documentation, see: [`tests/evaluations/README.md`](./tests/evaluations/README.md)
432+
314433
## Development & Community
315434

316435
### Roadmap & Contributing
@@ -332,4 +451,27 @@ Check out our roadmap to see what we're working on next and how you can contribu
332451
- [LangGraph Documentation](https://github.com/langchain-ai/langgraph) - Framework guides and examples
333452
- [LangSmith](https://smith.langchain.com/) - Tracing and collaboration platform
334453
- [ReAct Paper](https://arxiv.org/abs/2210.03629) - Original research on reasoning and acting
335-
- [Claude Code](https://claude.ai/code) - AI-powered development environment
454+
- [Claude Code](https://claude.ai/code) - AI-powered development environment
455+
456+
## Acknowledgments
457+
458+
This project is built on the shoulders of amazing open-source projects and service platforms:
459+
460+
### LangChain Official Projects
461+
- **[LangGraph](https://github.com/langchain-ai/langgraph)** - Powerful agent graph construction framework
462+
- **[LangChain](https://github.com/langchain-ai/langchain)** - Core library for building LLM applications
463+
- **[AgentEvals](https://github.com/langchain-ai/agentevals)** - Agent evaluation framework providing LLM-as-Judge methodology
464+
- **[OpenEvals](https://github.com/langchain-ai/openevals)** - Open evaluation tools and methods
465+
- **[LangSmith](https://smith.langchain.com/)** - LLM application tracing and debugging platform
466+
467+
### LangChain Community Integrations
468+
- **[langchain-siliconflow](https://pypi.org/project/langchain-siliconflow/)** - SiliconFlow model integration for open-source model support
469+
- **[langchain-qwq](https://pypi.org/project/langchain-qwq/)** - Alibaba Cloud Bailian platform model integration for Qwen series
470+
471+
### MaaS Platform Services
472+
- **SiliconFlow** - Chinese MaaS platform providing open-source models
473+
- **Alibaba Cloud Bailian (DashScope)** - Qwen series model service platform
474+
475+
View all version updates: [📋 GitHub Releases](https://github.com/webup/langgraph-up-react/releases)
476+
477+
Thank you to all contributors and the open-source community! 🙏

0 commit comments

Comments
 (0)