Skip to content

Commit 4899f58

Browse files
authored
feat: added save and load to RagasDataset (#1492)
- save and load - Migration docs
1 parent 59d5688 commit 4899f58

File tree

13 files changed

+232
-141
lines changed

13 files changed

+232
-141
lines changed

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,5 +168,4 @@ cython_debug/
168168
experiments/
169169
**/fil-result/
170170
src/ragas/_version.py
171-
.vscode
172-
/docs/references/
171+
.vscode

.readthedocs.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,4 @@ build:
77
commands:
88
- pip install -e .[docs]
99
- if [ -n "$GH_TOKEN" ]; then pip install git+https://${GH_TOKEN}@github.com/squidfunk/mkdocs-material-insiders.git; fi
10-
- python scripts/gen_ref_pages.py
1110
- mkdocs build --site-dir $READTHEDOCS_OUTPUT/html

Makefile

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,6 @@ run-ci: format lint type test ## Running all CI checks
3434

3535
# Docs
3636
docsite: ## Build and serve documentation
37-
@echo "Generating reference pages..."
38-
@python scripts/gen_ref_pages.py
3937
@mkdocs serve --dirty
4038
rewrite-docs: ## Use GPT4 to rewrite the documentation
4139
@echo "Rewriting the documentation in directory $(DIR)..."

docs/getstarted/rag_evaluation.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,9 @@ dataset = load_dataset("explodinggradients/amnesty_qa","english_v3")
1414
Converting data to ragas [evaluation dataset](../concepts/components/eval_dataset.md)
1515

1616
```python
17-
from ragas import EvaluationDataset, SingleTurnSample
18-
19-
samples = []
20-
for row in dataset['eval']:
21-
sample = SingleTurnSample(
22-
user_input=row['user_input'],
23-
reference=row['reference'],
24-
response=row['response'],
25-
retrieved_contexts=row['retrieved_contexts']
26-
)
27-
samples.append(sample)
28-
eval_dataset = EvaluationDataset(samples=samples)
17+
from ragas import EvaluationDataset
18+
19+
eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
2920
```
3021

3122

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Migration from v0.1 to v0.2
2+
3+
v0.2 is the start of the transition for Ragas from an evaluation library for RAG pipelines to a more general library that you can use to evaluate any LLM applications you build. The meant we had to make some fundamental changes to the library that will break your workflow. Hopeful this guide will make that transition as easy as possible.
4+
5+
## Outline
6+
7+
1. Evaluation Dataset
8+
2. Metrics
9+
3. Testset Generation
10+
4. Prompt Object
11+
12+
## Evaluation Dataset
13+
14+
We have moved from using HuggingFace [`Datasets`](https://huggingface.co/docs/datasets/v3.0.1/en/package_reference/main_classes#datasets.Dataset) to our own [`EvaluationDataset`][ragas.dataset_schema.EvaluationDataset] . You can read more about it from the core concepts section for [EvaluationDataset](../../concepts/components/evaluation-dataset.md) and [EvaluationSample](../../concepts/components/eval_sample.md)
15+
16+
You can easily translate
17+
18+
```python
19+
from ragas import EvaluationDataset, SingleTurnSample
20+
21+
hf_dataset = ... # your huggingface evaluation dataset
22+
eval_dataset = EvaluationDataset.from_hf_dataset(hf_dataset)
23+
24+
# save eval dataset
25+
eval_dataset.to_csv("path/to/save/dataset.csv")
26+
27+
# load eva dataset
28+
eval_dataset = EvaluationDataset.from_csv("path/to/save/dataset.csv")
29+
```
30+
31+
## Metrics
32+
33+
All the default metrics are still supported and many new metrics have been added. Take a look at the [documentation page](../../concepts/metrics/available_metrics/index.md) for the entire list.
34+
35+
How ever there are a couple of changes in how you use metrics
36+
37+
Firstly it is now preferred to initialize metrics with the evaluator LLM of your choice as oppose to using the initialized version of the metrics into [`evaluate()`][ragas.evaluation.evaluate] . This avoids a lot of confusion regarding which LLMs are used where.
38+
39+
```python
40+
from ragas.metrics import faithfullness # old way, not recommended but still supported till v0.3
41+
from ragas.metrics import Faithfulness
42+
43+
# preffered way
44+
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
45+
```
46+
Second is that [`metrics.ascore`][ragas.metrics.base.Metric.ascore] is now being deprecated in favor of [`metrics.single_score`][ragas.metrics.base.SingleTurnMetric.single_turn_ascore] . You can make the transition as such
47+
48+
```python
49+
# create a Single Turn Sample
50+
from ragas import SingleTurnSample
51+
sample = SingleTurnSample(
52+
user_input="user query",
53+
response="response from your pipeline"
54+
)
55+
56+
# Init the metric
57+
from ragas.metrics import Faithfulness
58+
faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
59+
score = faithfulness.sigle_turn_ascore(sample=sample)
60+
print(score)
61+
# 0.9
62+
```
63+
64+
## Testset Generation
65+
66+
[Testset Generation](../../concepts/test_data_generation/rag.md) has been redesigned to be much more cost efficient. If you were using the end-to-end workflow checkout the [getting started](../../getstarted/rag_testset_generation.md).
67+
68+
**Notable Changes**
69+
70+
- Removed `Docstore` in favor of a new `Knowledge Graph`
71+
- Added `Transforms` which will convert the documents passed into a rich knowledge graph
72+
- More customizable with `Synthesizer` objects. Also refer to the documentation.
73+
- New workflow makes it much cheaper and intermediate states can be saved easily
74+
75+
This might be a bit rough but if you do need help here, feel free to chat or mention it here and we would love to help you out 🙂
76+
77+
## Prompt Object
78+
79+
All the prompts have been rewritten to use [`PydanticPrompts`][ragas.prompt.pydantic_prompt.PydanticPrompt] which is based on [`BasePrompt`][ragas.prompt.base.BasePrompt] object. If you are using the old `Prompt` object you will have to upgrade it to the new one, check the docs to learn more on how to do it
80+
81+
- [How to Guide on how to create new prompts](../../howtos/customizations/metrics/modifying-prompts-metrics.md)
82+
- [Github PR for the changes](https://github.com/explodinggradients/ragas/pull/1462)
83+
84+
!!! note "Need Further Assistance?"
85+
86+
If you have any further questions feel free to post them in this [github issue](https://github.com/explodinggradients/ragas/issues/1486) or reach out to us on [cal.com](https://cal.com/shahul-ragas/30min)
87+

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ nav:
8787
- howtos/applications/index.md
8888
- Integrations:
8989
- howtos/integrations/index.md
90+
- Migrations:
91+
- From v0.1 to v0.2: howtos/migrations/migrate_from_v01_to_v02.md
9092
- 📖 References:
9193
- Core:
9294
- Prompt: references/prompt.md

scripts/gen_ref_pages.py

Lines changed: 0 additions & 43 deletions
This file was deleted.

src/ragas/dataset_schema.py

Lines changed: 83 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
from pandas import DataFrame as PandasDataframe
1313

1414

15-
class BaseEvalSample(BaseModel):
15+
class BaseSample(BaseModel):
1616
"""
1717
Base class for evaluation samples.
1818
"""
@@ -30,7 +30,7 @@ def get_features(self) -> t.List[str]:
3030
return list(self.to_dict().keys())
3131

3232

33-
class SingleTurnSample(BaseEvalSample):
33+
class SingleTurnSample(BaseSample):
3434
"""
3535
Represents evaluation samples for single-turn interactions.
3636
@@ -61,7 +61,7 @@ class SingleTurnSample(BaseEvalSample):
6161
rubric: t.Optional[t.Dict[str, str]] = None
6262

6363

64-
class MultiTurnSample(BaseEvalSample):
64+
class MultiTurnSample(BaseSample):
6565
"""
6666
Represents evaluation samples for multi-turn interactions.
6767
@@ -127,44 +127,14 @@ def pretty_repr(self):
127127
return "\n".join(lines)
128128

129129

130-
class EvaluationDataset(BaseModel):
131-
"""
132-
Represents a dataset of evaluation samples.
130+
Sample = t.TypeVar("Sample", bound=BaseSample)
133131

134-
Parameters
135-
----------
136-
samples : List[BaseEvalSample]
137-
A list of evaluation samples.
138132

139-
Attributes
140-
----------
141-
samples : List[BaseEvalSample]
142-
A list of evaluation samples.
143-
144-
Methods
145-
-------
146-
validate_samples(samples)
147-
Validates that all samples are of the same type.
148-
get_sample_type()
149-
Returns the type of the samples in the dataset.
150-
to_hf_dataset()
151-
Converts the dataset to a Hugging Face Dataset.
152-
to_pandas()
153-
Converts the dataset to a pandas DataFrame.
154-
features()
155-
Returns the features of the samples.
156-
from_list(mapping)
157-
Creates an EvaluationDataset from a list of dictionaries.
158-
from_dict(mapping)
159-
Creates an EvaluationDataset from a dictionary.
160-
"""
161-
162-
samples: t.List[BaseEvalSample]
133+
class RagasDataset(BaseModel, t.Generic[Sample]):
134+
samples: t.List[Sample]
163135

164136
@field_validator("samples")
165-
def validate_samples(
166-
cls, samples: t.List[BaseEvalSample]
167-
) -> t.List[BaseEvalSample]:
137+
def validate_samples(cls, samples: t.List[BaseSample]) -> t.List[BaseSample]:
168138
"""Validates that all samples are of the same type."""
169139
if len(samples) == 0:
170140
return samples
@@ -202,6 +172,11 @@ def to_hf_dataset(self) -> HFDataset:
202172

203173
return HFDataset.from_list(self._to_list())
204174

175+
@classmethod
176+
def from_hf_dataset(cls, dataset: HFDataset) -> "RagasDataset[Sample]":
177+
"""Creates an EvaluationDataset from a Hugging Face Dataset."""
178+
return cls.from_list(dataset.to_list())
179+
205180
def to_pandas(self) -> PandasDataframe:
206181
"""Converts the dataset to a pandas DataFrame."""
207182
try:
@@ -244,11 +219,80 @@ def from_dict(cls, mapping: t.Dict):
244219
samples.extend(SingleTurnSample(**sample) for sample in mapping)
245220
return cls(samples=samples)
246221

247-
def __iter__(self) -> t.Iterator[BaseEvalSample]: # type: ignore
222+
@classmethod
223+
def from_csv(cls, path: str):
224+
"""Creates an EvaluationDataset from a CSV file."""
225+
import csv
226+
227+
with open(path, "r", newline="") as csvfile:
228+
reader = csv.DictReader(csvfile)
229+
data = [row for row in reader]
230+
return cls.from_list(data)
231+
232+
def to_csv(self, path: str):
233+
"""Converts the dataset to a CSV file."""
234+
import csv
235+
236+
data = self._to_list()
237+
if not data:
238+
return
239+
240+
fieldnames = self.features()
241+
242+
with open(path, "w", newline="") as csvfile:
243+
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
244+
writer.writeheader()
245+
for row in data:
246+
writer.writerow(row)
247+
248+
def to_jsonl(self, path: str):
249+
"""Converts the dataset to a JSONL file."""
250+
with open(path, "w") as jsonlfile:
251+
for sample in self.samples:
252+
jsonlfile.write(json.dumps(sample.to_dict()) + "\n")
253+
254+
@classmethod
255+
def from_jsonl(cls, path: str):
256+
"""Creates an EvaluationDataset from a JSONL file."""
257+
with open(path, "r") as jsonlfile:
258+
data = [json.loads(line) for line in jsonlfile]
259+
return cls.from_list(data)
260+
261+
def __iter__(self) -> t.Iterator[Sample]: # type: ignore
248262
return iter(self.samples)
249263

250264
def __len__(self) -> int:
251265
return len(self.samples)
252266

253-
def __getitem__(self, idx: int) -> BaseEvalSample:
267+
def __getitem__(self, idx: int) -> Sample:
254268
return self.samples[idx]
269+
270+
271+
class EvaluationDataset(RagasDataset[BaseSample]):
272+
"""
273+
Represents a dataset of evaluation samples.
274+
275+
Attributes
276+
----------
277+
samples : List[BaseSample]
278+
A list of evaluation samples.
279+
280+
Methods
281+
-------
282+
validate_samples(samples)
283+
Validates that all samples are of the same type.
284+
get_sample_type()
285+
Returns the type of the samples in the dataset.
286+
to_hf_dataset()
287+
Converts the dataset to a Hugging Face Dataset.
288+
to_pandas()
289+
Converts the dataset to a pandas DataFrame.
290+
features()
291+
Returns the features of the samples.
292+
from_list(mapping)
293+
Creates an EvaluationDataset from a list of dictionaries.
294+
from_dict(mapping)
295+
Creates an EvaluationDataset from a dictionary.
296+
"""
297+
298+
pass

src/ragas/llms/prompt.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,9 @@ def format(self, **kwargs: t.Any) -> PromptValue:
160160
)
161161
for key, value in kwargs.items():
162162
if isinstance(value, str):
163-
kwargs[key] = json.dumps(value, ensure_ascii=False).encode("utf8").decode()
163+
kwargs[key] = (
164+
json.dumps(value, ensure_ascii=False).encode("utf8").decode()
165+
)
164166

165167
prompt = self.to_string()
166168
return PromptValue(prompt_str=prompt.format(**kwargs))

0 commit comments

Comments
 (0)