You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add version information and overview of new evaluation system features
- Expand model provider support documentation (SiliconFlow integration)
- Add comprehensive evaluation commands section with graph trajectory and multi-turn evaluations
- Update environment setup with SILICONFLOW_API_KEY configuration
- Reorganize model integrations documentation under src/common/models/ structure
- Add evaluation system file structure documentation (tests/evaluations/)
- Enhanced development guidelines with evaluation best practices and multi-model testing
- Update Python configuration with new evaluation dependencies (openevals, agentevals, langsmith)
Copy file name to clipboardExpand all lines: CLAUDE.md
+45-9Lines changed: 45 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
4
4
5
5
## Project Overview
6
6
7
-
This is a LangGraph ReAct (Reasoning and Action) Agent template that implements an iterative reasoning agent using LangGraph framework. The agent processes user queries through a cycle of reasoning, action execution, and observation.
7
+
This is a LangGraph ReAct (Reasoning and Action) Agent template that implements an iterative reasoning agent using LangGraph framework. The agent processes user queries through a cycle of reasoning, action execution, and observation.**Version 0.2.0** introduces comprehensive evaluation systems and expanded model provider support.
8
8
9
9
## Architecture
10
10
@@ -13,11 +13,12 @@ The core architecture follows a modular stateful graph pattern:
13
13
-**Common Module**: Shared components in `src/common/` provide reusable functionality across agents
14
14
-**State Management**: Uses `State` and `InputState` dataclasses (defined in `src/react_agent/state.py`) to track conversation messages and execution state
15
15
-**Graph Structure**: The main graph is defined in `src/react_agent/graph.py` with two primary nodes:
16
-
-`call_model`: Handles LLM reasoning and tool selection
16
+
-`call_model`: Handles LLM reasoning and tool selection
17
17
-`tools`: Executes selected tools via ToolNode
18
18
-**Execution Flow**: `call_model` → conditional routing → either `tools` (if tool calls needed) or `__end__` (if ready to respond) → back to `call_model` (creates the ReAct loop)
19
19
-**Context System**: Runtime context defined in `src/common/context.py` provides model configuration, system prompts, and DeepWiki integration control
20
20
-**Dynamic Tools**: Runtime tool loading with MCP integration for external documentation sources (DeepWiki MCP server)
21
+
-**Model Providers**: Multi-provider support including Anthropic, OpenAI, Qwen, QwQ, QvQ, and SiliconFlow with regional endpoint configuration
21
22
22
23
## Development Commands
23
24
@@ -40,6 +41,22 @@ make test_watch_e2e # Run e2e tests in watch mode
40
41
make extended_tests # Run extended test suite
41
42
```
42
43
44
+
### Evaluations
45
+
```bash
46
+
# Comprehensive evaluation suite (NEW in v0.2.0)
47
+
make evals # Run all evaluations (graph + multiturn)
48
+
make eval_graph # Run graph trajectory evaluations (LLM-as-judge)
49
+
make eval_multiturn # Run multi-turn chat evaluations (role-persona simulations)
50
+
51
+
# Model-specific evaluations
52
+
make eval_graph_qwen # Test with Qwen/Qwen3-8B model
53
+
make eval_graph_glm # Test with THUDM/GLM-4-9B model
54
+
55
+
# Persona-specific evaluations
56
+
make eval_multiturn_polite # Test with polite persona
57
+
make eval_multiturn_hacker # Test with hacker persona
58
+
```
59
+
43
60
### Code Quality
44
61
```bash
45
62
make lint # Run linters (ruff + mypy)
@@ -57,8 +74,12 @@ make dev_ui # Start LangGraph development server with UI
57
74
### Environment Setup
58
75
- Copy `.env.example` to `.env` and configure API keys
59
76
-**Required**: `TAVILY_API_KEY` for web search functionality
0 commit comments