Skip to content

Commit ca39b1b

Browse files
authored
Fix experimental docs navigation, fix broken tutorials, improve examples for better user understanding (#2156)
This PR improves the experimental documentation with better organization, clearer navigation, and fixes to code examples throughout the tutorials. ### 🔄 Navigation & Structure Improvements - **Renamed "Explanation" → "Core Concepts"** for clearer terminology - **Reordered tutorials** to follow a logical learning progression: 1. Prompt → RAG → Workflow → Agent (simple to complex) - **Reordered core concepts** to match tutorial flow: 1. Metrics → Datasets → Experimentation - **Fixed mkdocs.yml path** from `src` to `ragas/src` for proper documentation generation ### 📝 Content & Code Fixes - **Standardized API usage** across all examples: - Changed `result` → `value` in `MetricResult` objects - Fixed `values` → `allowed_values` in metric definitions - Updated response handling in RAG examples - **Enhanced tutorial clarity**: - Added setup instructions and prerequisites - Improved code explanations and comments - Added Quick Start sections for immediate testing - Fixed module import paths (`rag_evals` → `rag_eval`, etc.) - **Improved error handling** in example code with proper API key validation - Added experiment result printing for better debugging - Improved error messages and user guidance ### 🎯 Impact These changes make the experimental documentation more accessible to new users while providing a smoother learning experience that progresses from simple prompt evaluation to complex agent workflows. All examples have been tested and verified to work with the current API structure.
1 parent 922e4b7 commit ca39b1b

File tree

15 files changed

+108
-46
lines changed

15 files changed

+108
-46
lines changed
File renamed without changes.
File renamed without changes.

docs/experimental/explanation/index.md renamed to docs/experimental/core_concepts/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# 📚 Explanation
1+
# 📚 Core Concepts
22

33
1. [Metrics](metrics.md)
44
2. [Datasets and Experiment Results](datasets.md)
File renamed without changes.

docs/experimental/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@ The goal of Ragas Experimental is to evolve Ragas into a general-purpose evaluat
1515

1616
[:octicons-arrow-right-24: Tutorials](tutorials/index.md)
1717

18-
- 📚 **Explanation**
18+
- 📚 **Core Concepts**
1919

2020
A deeper dive into the principles of evaluation and how Ragas Experimental supports evaluation-driven development for AI applications.
2121

22-
[:octicons-arrow-right-24: Explanation](explanation/index.md)
22+
[:octicons-arrow-right-24: Core Concepts](core_concepts/index.md)
2323

2424
</div>
2525

docs/experimental/tutorials/agent.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ We will start by testing our simple agent that can solve mathematical expression
2323
python -m ragas_examples.agent_evals.agent
2424
```
2525

26-
Next, we will write down a few sample expressions and expected outputs for our agent. Then convert them to a CSV file.
26+
Next, we will create a few sample expressions and expected outputs for our agent, then convert them to a CSV file.
2727

2828
```python
2929
import pandas as pd
@@ -38,7 +38,7 @@ df = pd.DataFrame(dataset)
3838
df.to_csv("datasets/test_dataset.csv", index=False)
3939
```
4040

41-
To evaluate the performance of our agent, we will define a non llm metric that compares if our agent's output is within a certain tolerance of the expected output and outputs 1/0 based on it.
41+
To evaluate the performance of our agent, we will define a non-LLM metric that compares if our agent's output is within a certain tolerance of the expected output and returns 1/0 based on the comparison.
4242

4343
```python
4444
from ragas_experimental.metrics import numeric_metric
@@ -50,7 +50,7 @@ def correctness_metric(prediction: float, actual: float):
5050
if isinstance(prediction, str) and "ERROR" in prediction:
5151
return 0.0
5252
result = 1.0 if abs(prediction - actual) < 1e-5 else 0.0
53-
return MetricResult(result=result, reason=f"Prediction: {prediction}, Actual: {actual}")
53+
return MetricResult(value=result, reason=f"Prediction: {prediction}, Actual: {actual}")
5454
```
5555

5656
Next, we will write the experiment loop that will run our agent on the test dataset and evaluate it using the metric, and store the results in a CSV file.
@@ -74,23 +74,22 @@ async def run_experiment(row):
7474
"expected_answer": expected_answer,
7575
"prediction": prediction.get("result"),
7676
"log_file": prediction.get("log_file"),
77-
"correctness": correctness.result
77+
"correctness": correctness.value
7878
}
7979
```
8080

8181
Now whenever you make a change to your agent, you can run the experiment and see how it affects the performance of your agent.
8282

8383
## Running the example end to end
8484

85-
1. Setup your OpenAI API key
86-
85+
1. Set up your OpenAI API key
8786
```bash
8887
export OPENAI_API_KEY="your_api_key_here"
8988
```
90-
2. Run the evaluation
9189

90+
2. Run the evaluation
9291
```bash
9392
python -m ragas_examples.agent_evals.evals
9493
```
9594

96-
Viola! You have successfully evaluated an AI agent using Ragas. You can now view the results by opening the `experiments/experiment_name.csv` file.
95+
Voilà! You have successfully evaluated an AI agent using Ragas. You can now view the results by opening the `experiments/experiment_name.csv` file.

docs/experimental/tutorials/prompt.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,24 @@ flowchart LR
1111

1212
We will start by testing a simple prompt that classifies movie reviews as positive or negative.
1313

14+
First, make sure you have installed ragas examples and setup your OpenAI API key:
15+
16+
```bash
17+
pip install ragas_experimental[examples]
18+
export OPENAI_API_KEY = "your_openai_api_key"
19+
```
20+
21+
Now test the prompt:
22+
1423
```bash
1524
python -m ragas_examples.prompt_evals.prompt
1625
```
1726

18-
Next, we will write down few sample inputs and expected outputs for our prompt. Then convert them to a a csv file
27+
This will test the input `"The movie was fantastic and I loved every moment of it!"` and should output `"positive"`.
28+
29+
> **💡 Quick Start**: If you want to see the complete evaluation in action, you can jump straight to the [end-to-end command](#running-the-example-end-to-end) that runs everything and generates the CSV results automatically.
30+
31+
Next, we will write down few sample inputs and expected outputs for our prompt. Then convert them to a CSV file.
1932

2033
```python
2134
import pandas as pd
@@ -33,10 +46,10 @@ Now we need to have a way to measure the performance of our prompt in this task.
3346
from ragas_experimental.metrics import discrete_metric
3447
from ragas_experimental.metrics.result import MetricResult
3548

36-
@discrete_metric(name="accuracy", values=["pass", "fail"])
49+
@discrete_metric(name="accuracy", allowed_values=["pass", "fail"])
3750
def my_metric(prediction: str, actual: str):
3851
"""Calculate accuracy of the prediction."""
39-
return MetricResult(result="pass", reason="") if prediction == actual else MetricResult(result="fail", reason="")
52+
return MetricResult(value="pass", reason="") if prediction == actual else MetricResult(value="fail", reason="")
4053
```
4154

4255
Next, we will write the experiment loop that will run our prompt on the test dataset and evaluate it using the metric, and store the results in a csv file.
@@ -67,16 +80,19 @@ Now whenever you make a change to your prompt, you can run the experiment and se
6780
## Running the example end to end
6881

6982
1. Setup your OpenAI API key
70-
7183
```bash
7284
export OPENAI_API_KEY = "your_openai_api_key"
7385
```
74-
7586
2. Run the evaluation
76-
7787
```bash
7888
python -m ragas_examples.prompt_evals.evals
7989
```
8090

81-
Voila! You have successfully run your first evaluation using Ragas. You can now inspect the results by opening the `experiments/experiment_name.csv` file.
91+
This will:
92+
93+
- Create the test dataset with sample movie reviews
94+
- Run the sentiment classification prompt on each sample
95+
- Evaluate the results using the accuracy metric
96+
- Export everything to a CSV file with the results
8297

98+
Voila! You have successfully run your first evaluation using Ragas. You can now inspect the results by opening the `experiments/experiment_name.csv` file.

docs/experimental/tutorials/rag.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ from ragas_experimental.metrics import DiscreteMetric
4141
my_metric = DiscreteMetric(
4242
name="correctness",
4343
prompt = "Check if the response contains points mentioned from the grading notes and return 'pass' or 'fail'.\nResponse: {response} Grading Notes: {grading_notes}",
44-
values=["pass", "fail"],
44+
allowed_values=["pass", "fail"],
4545
)
4646
```
4747

@@ -60,8 +60,8 @@ async def run_experiment(row):
6060

6161
experiment_view = {
6262
**row,
63-
"response": response,
64-
"score": score.result,
63+
"response": response.get("answer", ""),
64+
"score": score.value,
6565
"log_file": response.get("logs", " "),
6666
}
6767
return experiment_view
@@ -72,15 +72,12 @@ Now whenever you make a change to your RAG pipeline, you can run the experiment
7272
## Running the example end to end
7373

7474
1. Setup your OpenAI API key
75-
7675
```bash
77-
export OPENAI_API_KEY = "your_openai_api_key"
76+
export OPENAI_API_KEY="your_openai_api_key"
7877
```
79-
8078
2. Run the evaluation
81-
8279
```bash
83-
python -m ragas_examples.rag_evals.evals
80+
python -m ragas_examples.rag_eval.evals
8481
```
8582

86-
Voila! You have successfully run your first evaluation using Ragas. You can now inspect the results by opening the `experiments/experiment_name.csv` file
83+
Voila! You have successfully run your first evaluation using Ragas. You can now inspect the results by opening the `experiments/experiment_name.csv` file.

docs/experimental/tutorials/workflow.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,14 @@ from ragas_experimental.metrics import DiscreteMetric
4242
my_metric = DiscreteMetric(
4343
name="response_quality",
4444
prompt="Evaluate the response based on the pass criteria: {pass_criteria}. Does the response meet the criteria? Return 'pass' or 'fail'.\nResponse: {response}",
45-
values=["pass", "fail"],
45+
allowed_values=["pass", "fail"],
4646
)
4747
```
4848

4949
Next, we will write the evaluation experiment loop that will run our workflow on the test dataset and evaluate it using the metric, and store the results in a CSV file.
5050

5151
```python
52+
from ragas_experimental import experiment
5253

5354
@experiment()
5455
async def run_experiment(row):
@@ -65,7 +66,7 @@ async def run_experiment(row):
6566
experiment_view = {
6667
**row,
6768
"response": response.get("response_template", " "),
68-
"score": score.result,
69+
"score": score.value,
6970
"score_reason": score.reason,
7071
}
7172
return experiment_view
@@ -75,13 +76,13 @@ Now whenever you make a change to your workflow, you can run the experiment and
7576

7677
## Running the example end to end
7778
1. Setup your OpenAI API key
78-
7979
```bash
8080
export OPENAI_API_KEY="your_openai_api_key"
8181
```
8282

83+
2. Run the experiment
8384
```bash
84-
python -m ragas_examples.workflow_evals.evals
85+
python -m ragas_examples.workflow_eval.evals
8586
```
8687

87-
Voila! You have successfully run your first evaluation using Ragas. You can now inspect the results by opening the `experiments/experiment_name.csv` file
88+
Voila! You have successfully run your first evaluation using Ragas. You can now inspect the results by opening the `experiments/experiment_name.csv` file.

experimental/ragas_examples/agent_evals/evals.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,8 @@ async def run_experiment(row):
6262

6363
async def main():
6464
dataset = load_dataset()
65-
_ = await run_experiment.arun(dataset)
65+
experiment_result = await run_experiment.arun(dataset)
66+
print("Experiment_result: ", experiment_result)
6667

6768

6869
if __name__ == "__main__":

0 commit comments

Comments
 (0)