You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Update README.md and README_CN.md with v0.2.0 evaluation system documentation
- Add SiliconFlow integration details and API key setup instructions
- Update ROADMAP.md to mark v0.2.0 milestone as completed
- Add .claude directory to .gitignore for Claude Code integration
This template showcases a [ReAct agent](https://arxiv.org/abs/2210.03629) implemented using [LangGraph](https://github.com/langchain-ai/langgraph), works seamlessly with [LangGraph Studio](https://docs.langchain.com/langgraph-platform/quick-start-studio#use-langgraph-studio). ReAct agents are uncomplicated, prototypical agents that can be flexibly extended to many tools.
12
12
13
+
**🎉 Latest v0.2.0 Release**: Complete evaluation system and multi-model support! Check the [release notes](https://github.com/webup/langgraph-up-react/releases) for all new features.
14
+
13
15

14
16
15
17
The core logic, defined in `src/react_agent/graph.py`, demonstrates a flexible ReAct agent that iteratively reasons about user queries and executes actions. The template features a modular architecture with shared components in `src/common/`, MCP integration for external documentation sources, and comprehensive testing suite.
@@ -19,11 +21,18 @@ The core logic, defined in `src/react_agent/graph.py`, demonstrates a flexible R
19
21
## Features
20
22
21
23
### Multi-Provider Model Support
24
+
-**SiliconFlow Integration**: Complete support for Chinese MaaS platform with open-source models (Qwen, GLM, DeepSeek, etc.)
22
25
-**Qwen Models**: Complete Qwen series support via `langchain-qwq` package, including Qwen-Plus, Qwen-Turbo, QwQ-32B, QvQ-72B
23
26
-**OpenAI**: GPT-4o, GPT-4o-mini, etc.
24
27
-**OpenAI-Compatible**: Any provider supporting OpenAI API format via custom API key and base URL
25
28
-**Anthropic**: Claude 4 Sonnet, Claude 3.5 Haiku, etc.
The template provides a comprehensive evaluation system using a dual-methodology approach:
351
+
352
+
#### 🎯 Graph Trajectory Evaluation
353
+
Tests agent reasoning patterns and tool usage decisions:
354
+
355
+
```bash
356
+
# Run comprehensive graph trajectory evaluation
357
+
make eval_graph
358
+
359
+
# Test specific models
360
+
make eval_graph_qwen # Qwen/Qwen3-8B model
361
+
make eval_graph_glm # GLM-4-9B-0414 model
362
+
```
363
+
364
+
**Evaluation Scenarios**:
365
+
-**Simple Question**: "What is the capital of France?" - Tests efficiency for basic facts
366
+
-**Search Required**: "What's the latest news about artificial intelligence?" - Tests tool usage and information synthesis
367
+
-**Multi-step Reasoning**: "What are the pros and cons of renewable energy, and what are the latest developments?" - Tests complex analytical tasks
368
+
369
+
#### 🔄 Multi-turn Chat Simulation
370
+
Tests conversational capabilities through role-persona interactions:
371
+
372
+
```bash
373
+
# Start development server (required for multi-turn evaluation)
374
+
make dev
375
+
376
+
# Run multi-turn evaluation in another terminal
377
+
make eval_multiturn
378
+
379
+
# Test specific user personas
380
+
make eval_multiturn_polite # Polite user persona
381
+
make eval_multiturn_hacker # Adversarial user persona
382
+
```
383
+
384
+
**Role Scenarios**:
385
+
-**Writing Assistant** × User Personas: Professional email collaboration
386
+
-**Customer Service** × User Personas: Account troubleshooting support
387
+
-**Interviewer** × User Personas: Technical interview management
388
+
389
+
### Multi-Provider Model Testing
390
+
391
+
The evaluation system supports testing across different model providers:
392
+
393
+
-**🌍 International Models**: OpenAI GPT-4o, Anthropic Claude, etc.
394
+
-**🇨🇳 Chinese Models**: SiliconFlow platform (Qwen, GLM, DeepSeek models)
395
+
-**🔄 Comparative Analysis**: Side-by-side performance comparison across providers
396
+
-**💡 Cost Optimization**: Identify the most cost-effective models for your use cases
397
+
398
+
### Evaluation System Details
399
+
400
+
The evaluation system provides a comprehensive agent performance analysis framework with detailed test scenarios, evaluation methodology, and results analysis.
401
+
402
+
For specific evaluation results, test scenarios, and usage instructions, please refer to the detailed evaluation system documentation.
403
+
404
+
### Quick Start with Evaluation
405
+
406
+
```bash
407
+
# Set up required environment variables
408
+
export SILICONFLOW_API_KEY="your_siliconflow_api_key"# For model testing
409
+
export TAVILY_API_KEY="your_tavily_api_key"# For search functionality
410
+
export LANGSMITH_API_KEY="your_langsmith_api_key"# For evaluation tracking
411
+
412
+
# Run comprehensive evaluation suite
413
+
make evals
414
+
415
+
# Or run evaluations separately
416
+
make eval_graph # Graph trajectory evaluation (runs independently)
417
+
make eval_multiturn # Multi-turn chat evaluation (requires server)
418
+
419
+
# View release notes and version information
420
+
# Visit GitHub Releases page for all version release notes: https://github.com/webup/langgraph-up-react/releases
0 commit comments