Skip to content

Commit a89b0ea

Browse files
committed
📝 docs: update CLAUDE.md for v0.2.0
- Add version information and overview of new evaluation system features - Expand model provider support documentation (SiliconFlow integration) - Add comprehensive evaluation commands section with graph trajectory and multi-turn evaluations - Update environment setup with SILICONFLOW_API_KEY configuration - Reorganize model integrations documentation under src/common/models/ structure - Add evaluation system file structure documentation (tests/evaluations/) - Enhanced development guidelines with evaluation best practices and multi-model testing - Update Python configuration with new evaluation dependencies (openevals, agentevals, langsmith)
1 parent 0a4fd93 commit a89b0ea

File tree

1 file changed

+45
-9
lines changed

1 file changed

+45
-9
lines changed

CLAUDE.md

Lines changed: 45 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
This is a LangGraph ReAct (Reasoning and Action) Agent template that implements an iterative reasoning agent using LangGraph framework. The agent processes user queries through a cycle of reasoning, action execution, and observation.
7+
This is a LangGraph ReAct (Reasoning and Action) Agent template that implements an iterative reasoning agent using LangGraph framework. The agent processes user queries through a cycle of reasoning, action execution, and observation. **Version 0.2.0** introduces comprehensive evaluation systems and expanded model provider support.
88

99
## Architecture
1010

@@ -13,11 +13,12 @@ The core architecture follows a modular stateful graph pattern:
1313
- **Common Module**: Shared components in `src/common/` provide reusable functionality across agents
1414
- **State Management**: Uses `State` and `InputState` dataclasses (defined in `src/react_agent/state.py`) to track conversation messages and execution state
1515
- **Graph Structure**: The main graph is defined in `src/react_agent/graph.py` with two primary nodes:
16-
- `call_model`: Handles LLM reasoning and tool selection
16+
- `call_model`: Handles LLM reasoning and tool selection
1717
- `tools`: Executes selected tools via ToolNode
1818
- **Execution Flow**: `call_model` → conditional routing → either `tools` (if tool calls needed) or `__end__` (if ready to respond) → back to `call_model` (creates the ReAct loop)
1919
- **Context System**: Runtime context defined in `src/common/context.py` provides model configuration, system prompts, and DeepWiki integration control
2020
- **Dynamic Tools**: Runtime tool loading with MCP integration for external documentation sources (DeepWiki MCP server)
21+
- **Model Providers**: Multi-provider support including Anthropic, OpenAI, Qwen, QwQ, QvQ, and SiliconFlow with regional endpoint configuration
2122

2223
## Development Commands
2324

@@ -40,6 +41,22 @@ make test_watch_e2e # Run e2e tests in watch mode
4041
make extended_tests # Run extended test suite
4142
```
4243

44+
### Evaluations
45+
```bash
46+
# Comprehensive evaluation suite (NEW in v0.2.0)
47+
make evals # Run all evaluations (graph + multiturn)
48+
make eval_graph # Run graph trajectory evaluations (LLM-as-judge)
49+
make eval_multiturn # Run multi-turn chat evaluations (role-persona simulations)
50+
51+
# Model-specific evaluations
52+
make eval_graph_qwen # Test with Qwen/Qwen3-8B model
53+
make eval_graph_glm # Test with THUDM/GLM-4-9B model
54+
55+
# Persona-specific evaluations
56+
make eval_multiturn_polite # Test with polite persona
57+
make eval_multiturn_hacker # Test with hacker persona
58+
```
59+
4360
### Code Quality
4461
```bash
4562
make lint # Run linters (ruff + mypy)
@@ -57,8 +74,12 @@ make dev_ui # Start LangGraph development server with UI
5774
### Environment Setup
5875
- Copy `.env.example` to `.env` and configure API keys
5976
- **Required**: `TAVILY_API_KEY` for web search functionality
60-
- **Model Providers**: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `DASHSCOPE_API_KEY` (for Qwen models)
61-
- **Optional**: `REGION` (set to `prc` or `international` for Qwen API endpoints)
77+
- **Model Providers**:
78+
- `ANTHROPIC_API_KEY` for Anthropic models
79+
- `OPENAI_API_KEY` for OpenAI models
80+
- `DASHSCOPE_API_KEY` for Qwen/QwQ/QvQ models
81+
- `SILICONFLOW_API_KEY` for SiliconFlow models (NEW in v0.2.0)
82+
- **Optional**: `REGION` (set to `prc`/`cn` or `international`/`en` for regional API endpoints)
6283
- **Optional**: `ENABLE_DEEPWIKI=true` to enable DeepWiki MCP documentation tools
6384
- **Default Model**: Uses `qwen:qwen-flash` as default model (configurable via `MODEL` environment variable)
6485

@@ -72,14 +93,23 @@ make dev_ui # Start LangGraph development server with UI
7293
- `src/common/context.py`: Runtime context and configuration with environment variable support and DeepWiki integration
7394
- `src/common/tools.py`: Tool definitions including web search and dynamic MCP tool loading
7495
- `src/common/mcp.py`: MCP client management for external documentation sources (e.g. DeepWiki)
75-
- `src/common/models.py`: Custom model integrations (Qwen, QwQ, QvQ) with regional API support
96+
- `src/common/models/`: Model provider integrations with regional API support
97+
- `qwen.py`: Qwen, QwQ, QvQ model integrations via DashScope
98+
- `siliconflow.py`: SiliconFlow model integrations (NEW in v0.2.0)
7699
- `src/common/prompts.py`: System prompt templates
77-
- `src/common/utils.py`: Shared utility functions
100+
- `src/common/utils.py`: Shared utility functions including model loading
78101

79102
### Configuration
80103
- `langgraph.json`: LangGraph Studio configuration pointing to the main graph
81104
- `.env`: Environment variables for API keys and configuration
82105

106+
### Evaluation System (NEW in v0.2.0)
107+
- `tests/evaluations/`: Comprehensive evaluation framework
108+
- `graph.py`: Graph trajectory evaluation using AgentEvals with LLM-as-judge methodology
109+
- `multiturn.py`: Multi-turn chat evaluation with role-persona simulations
110+
- `config.py`: Evaluation configuration and test scenarios
111+
- `utils.py`: Shared evaluation utilities and scoring functions
112+
83113
## LangGraph Studio Integration
84114

85115
This project works seamlessly with LangGraph Studio. The `langgraph.json` config file defines:
@@ -105,13 +135,19 @@ This project integrates with Model Context Protocol (MCP) servers for dynamic ex
105135
## Python Configuration
106136

107137
- Python requirement: `>=3.11,<4.0`
108-
- Main dependencies: LangGraph, LangChain, provider-specific packages, langchain-mcp-adapters
109-
- Development tools: mypy, ruff, pytest
138+
- Main dependencies: LangGraph, LangChain, provider-specific packages, langchain-mcp-adapters, langchain-siliconflow
139+
- Development tools: mypy, ruff, pytest, langgraph-cli, langgraph-sdk
140+
- Evaluation dependencies: openevals, agentevals, langsmith (NEW in v0.2.0)
110141
- Package structure supports both standalone and LangGraph template usage
111142

112143
## Development Guidelines
113144

114145
- **Research Tools**: Use `context7` and/or `deepwiki` to study unfamiliar projects or frameworks
115146
- **Code Quality**: Always run `make lint` after completing tasks
116147
- **Testing**: Comprehensive test suite includes unit, integration, and e2e tests with DeepWiki MCP integration coverage
117-
- **MCP Integration**: DeepWiki tools are dynamically loaded when `enable_deepwiki=True` in context configuration
148+
- **Evaluation**: Use `make evals` to run comprehensive agent evaluations (NEW in v0.2.0)
149+
- Graph trajectory evaluation with LLM-as-judge methodology
150+
- Multi-turn conversation evaluation with role-persona simulations
151+
- Model-specific testing across different providers (Qwen, GLM, etc.)
152+
- **MCP Integration**: DeepWiki tools are dynamically loaded when `enable_deepwiki=True` in context configuration
153+
- **Multi-Model Support**: Test across different providers using evaluation commands for comprehensive coverage

0 commit comments

Comments
 (0)