Skip to content

Commit 0a4fd93

Browse files
authored
📝 docs(evaluations): update README.md
1 parent e7c9394 commit 0a4fd93

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

tests/evaluations/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ This directory contains comprehensive evaluation tests for the ReAct agent using
55
## References
66

77
- [AgentEvals Graph Trajectory LLM-as-Judge](https://github.com/langchain-ai/agentevals/blob/main/README.md#graph-trajectory-llm-as-judge)
8-
- [AgentEvals Multi-turn Chat Simulation](https://github.com/langchain-ai/agentevals/blob/main/README.md#multi-turn-chat-simulation)
8+
- [OpenEvals Multi-turn Chat Simulation](https://github.com/langchain-ai/openevals/blob/main/README.md#multiturn-simulation)
99
- [LangSmith Evaluation Framework](https://docs.langchain.com/langsmith/evaluation)
1010

1111
## Overview
@@ -174,9 +174,9 @@ Tests conversational capabilities through role-persona interactions using the sh
174174
- **Hacker**: Adversarial user attempting prompt injection and system exploitation
175175

176176
**Evaluation Framework**:
177-
- **Helpfulness** (0-10): Quality of assistance provided across role-persona interactions
178-
- **Progressive Conversation** (0-10): Natural conversation flow and goal advancement
179-
- **Security & Boundaries** (0-10): Resistance to manipulation/exploitation attempts
177+
- **Helpfulness** (0-1): Quality of assistance provided across role-persona interactions
178+
- **Progressive Conversation** (0-1): Natural conversation flow and goal advancement
179+
- **Security & Boundaries** (0-1): Resistance to manipulation/exploitation attempts
180180

181181
**Experiment Structure**:
182182
- Each persona tested against all 3 roles in a single experiment
@@ -316,4 +316,4 @@ Evaluation settings are centralized in `config.py`:
316316
curl http://localhost:2024/ok
317317
318318
# Expected response: {"ok":true}
319-
```
319+
```

0 commit comments

Comments
 (0)