explodinggradients
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 2 deletions b/‎.gitignore‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎.readthedocs.yml‎
Lines changed: 0 additions & 1 deletion b/‎.readthedocs.yml‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎Makefile‎
Lines changed: 0 additions & 2 deletions b/‎Makefile‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/getstarted/rag_evaluation.md‎
Lines changed: 3 additions & 12 deletions b/‎docs/getstarted/rag_evaluation.md‎
Lines changed: 3 additions & 12 deletions
diff --git a/‎docs/howtos/migrations/migrate_from_v01_to_v02.md‎
Lines changed: 87 additions & 0 deletions b/‎docs/howtos/migrations/migrate_from_v01_to_v02.md‎
Lines changed: 87 additions & 0 deletions
diff --git a/‎mkdocs.yml‎
Lines changed: 2 additions & 0 deletions b/‎mkdocs.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎scripts/gen_ref_pages.py‎
Lines changed: 0 additions & 43 deletions b/‎scripts/gen_ref_pages.py‎
Lines changed: 0 additions & 43 deletions
diff --git a/‎src/ragas/dataset_schema.py‎
Lines changed: 83 additions & 39 deletions b/‎src/ragas/dataset_schema.py‎
Lines changed: 83 additions & 39 deletions
diff --git a/‎src/ragas/llms/prompt.py‎
Lines changed: 3 additions & 1 deletion b/‎src/ragas/llms/prompt.py‎
Lines changed: 3 additions & 1 deletion
@@ -168,5 +168,4 @@ cython_debug/
 experiments/
 **/fil-result/
 src/ragas/_version.py
-.vscode
-/docs/references/
+.vscode
@@ -7,5 +7,4 @@ build:
   commands:
     - pip install -e .[docs]
     - if [ -n "$GH_TOKEN" ]; then pip install git+https://${GH_TOKEN}@github.com/squidfunk/mkdocs-material-insiders.git; fi
-    - python scripts/gen_ref_pages.py
     - mkdocs build --site-dir $READTHEDOCS_OUTPUT/html
@@ -34,8 +34,6 @@ run-ci: format lint type test ## Running all CI checks
 
 # Docs
 docsite: ## Build and serve documentation
-	@echo "Generating reference pages..."
-	@python scripts/gen_ref_pages.py
 	@mkdocs serve --dirty
 rewrite-docs: ## Use GPT4 to rewrite the documentation
 	@echo "Rewriting the documentation in directory $(DIR)..."
 
@@ -14,18 +14,9 @@ dataset = load_dataset("explodinggradients/amnesty_qa","english_v3")
 Converting data to ragas [evaluation dataset](../concepts/components/eval_dataset.md)
 
 ```python
-from ragas import EvaluationDataset, SingleTurnSample
-
-samples = []
-for row in dataset['eval']:
-    sample = SingleTurnSample(
-        user_input=row['user_input'],
-        reference=row['reference'],
-        response=row['response'],
-        retrieved_contexts=row['retrieved_contexts']
-    )
-    samples.append(sample)
-eval_dataset = EvaluationDataset(samples=samples)
+from ragas import EvaluationDataset
+
+eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
 ```
 
 
 
@@ -0,0 +1,87 @@
+# Migration from v0.1 to v0.2
+
+v0.2 is the start of the transition for Ragas from an evaluation library for RAG pipelines to a more general library that you can use to evaluate any LLM applications you build. The meant we had to make some fundamental changes to the library that will break your workflow. Hopeful this guide will make that transition as easy as possible.
+
+## Outline
+
+1. Evaluation Dataset
+2. Metrics
+3. Testset Generation
+4. Prompt Object
+
+## Evaluation Dataset
+
+We have moved from using HuggingFace [`Datasets`](https://huggingface.co/docs/datasets/v3.0.1/en/package_reference/main_classes#datasets.Dataset) to our own [`EvaluationDataset`][ragas.dataset_schema.EvaluationDataset] . You can read more about it from the core concepts section for [EvaluationDataset](../../concepts/components/evaluation-dataset.md) and [EvaluationSample](../../concepts/components/eval_sample.md)
+
+You can easily translate 
+
+```python
+from ragas import EvaluationDataset, SingleTurnSample
+
+hf_dataset = ... # your huggingface evaluation dataset
+eval_dataset = EvaluationDataset.from_hf_dataset(hf_dataset)
+
+# save eval dataset
+eval_dataset.to_csv("path/to/save/dataset.csv")
+
+# load eva dataset
+eval_dataset = EvaluationDataset.from_csv("path/to/save/dataset.csv")
+```
+
+## Metrics
+
+All the default metrics are still supported and many new metrics have been added. Take a look at the [documentation page](../../concepts/metrics/available_metrics/index.md) for the entire list.
+
+How ever there are a couple of changes in how you use metrics
+
+Firstly it is now preferred to initialize metrics with the evaluator LLM of your choice as oppose to using the initialized version of the metrics into [`evaluate()`][ragas.evaluation.evaluate] . This avoids a lot of confusion regarding which LLMs are used where.
+
+```python
+from ragas.metrics import faithfullness # old way, not recommended but still supported till v0.3
+from ragas.metrics import Faithfulness
+
+# preffered way
+faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
+```
+Second is that [`metrics.ascore`][ragas.metrics.base.Metric.ascore] is now being deprecated in favor of [`metrics.single_score`][ragas.metrics.base.SingleTurnMetric.single_turn_ascore] . You can make the transition as such
+
+```python
+# create a Single Turn Sample
+from ragas import SingleTurnSample
+sample = SingleTurnSample(
+	user_input="user query",
+	response="response from your pipeline"
+)
+
+# Init the metric
+from ragas.metrics import Faithfulness
+faithfulness_metric = Faithfulness(llm=your_evaluator_llm)
+score = faithfulness.sigle_turn_ascore(sample=sample)
+print(score)
+# 0.9
+```
+
+## Testset Generation
+
+[Testset Generation](../../concepts/test_data_generation/rag.md) has been redesigned to be much more cost efficient. If you were using the end-to-end workflow checkout the [getting started](../../getstarted/rag_testset_generation.md).
+
+**Notable Changes**
+
+- Removed `Docstore` in favor of a new `Knowledge Graph`
+- Added `Transforms` which will convert the documents passed into a rich knowledge graph
+- More customizable with `Synthesizer` objects. Also refer to the documentation.
+- New workflow makes it much cheaper and intermediate states can be saved easily
+
+This might be a bit rough but if you do need help here, feel free to chat or mention it here and we would love to help you out 🙂
+
+## Prompt Object
+
+All the prompts have been rewritten to use [`PydanticPrompts`][ragas.prompt.pydantic_prompt.PydanticPrompt] which is based on [`BasePrompt`][ragas.prompt.base.BasePrompt] object. If you are using the old `Prompt` object you will have to upgrade it to the new one, check the docs to learn more on how to do it
+
+- [How to Guide on how to create new prompts](../../howtos/customizations/metrics/modifying-prompts-metrics.md)
+- [Github PR for the changes](https://github.com/explodinggradients/ragas/pull/1462)
+
+!!! note "Need Further Assistance?"
+
+    If you have any further questions feel free to post them in this [github issue](https://github.com/explodinggradients/ragas/issues/1486) or reach out to us on [cal.com](https://cal.com/shahul-ragas/30min)
+
@@ -87,6 +87,8 @@ nav:
           - howtos/applications/index.md
       - Integrations:
           - howtos/integrations/index.md
+      - Migrations:
+          - From v0.1 to v0.2: howtos/migrations/migrate_from_v01_to_v02.md
   - 📖 References: 
     - Core:
       - Prompt: references/prompt.md
 
@@ -12,7 +12,7 @@
     from pandas import DataFrame as PandasDataframe
 
 
-class BaseEvalSample(BaseModel):
+class BaseSample(BaseModel):
     """
     Base class for evaluation samples.
     """
@@ -30,7 +30,7 @@ def get_features(self) -> t.List[str]:
         return list(self.to_dict().keys())
 
 
-class SingleTurnSample(BaseEvalSample):
+class SingleTurnSample(BaseSample):
     """
     Represents evaluation samples for single-turn interactions.
 
@@ -61,7 +61,7 @@ class SingleTurnSample(BaseEvalSample):
     rubric: t.Optional[t.Dict[str, str]] = None
 
 
-class MultiTurnSample(BaseEvalSample):
+class MultiTurnSample(BaseSample):
     """
     Represents evaluation samples for multi-turn interactions.
 
@@ -127,44 +127,14 @@ def pretty_repr(self):
         return "\n".join(lines)
 
 
-class EvaluationDataset(BaseModel):
-    """
-    Represents a dataset of evaluation samples.
+Sample = t.TypeVar("Sample", bound=BaseSample)
 
-    Parameters
-    ----------
-    samples : List[BaseEvalSample]
-        A list of evaluation samples.
 
-    Attributes
-    ----------
-    samples : List[BaseEvalSample]
-        A list of evaluation samples.
-
-    Methods
-    -------
-    validate_samples(samples)
-        Validates that all samples are of the same type.
-    get_sample_type()
-        Returns the type of the samples in the dataset.
-    to_hf_dataset()
-        Converts the dataset to a Hugging Face Dataset.
-    to_pandas()
-        Converts the dataset to a pandas DataFrame.
-    features()
-        Returns the features of the samples.
-    from_list(mapping)
-        Creates an EvaluationDataset from a list of dictionaries.
-    from_dict(mapping)
-        Creates an EvaluationDataset from a dictionary.
-    """
-
-    samples: t.List[BaseEvalSample]
+class RagasDataset(BaseModel, t.Generic[Sample]):
+    samples: t.List[Sample]
 
     @field_validator("samples")
-    def validate_samples(
-        cls, samples: t.List[BaseEvalSample]
-    ) -> t.List[BaseEvalSample]:
+    def validate_samples(cls, samples: t.List[BaseSample]) -> t.List[BaseSample]:
         """Validates that all samples are of the same type."""
         if len(samples) == 0:
             return samples
@@ -202,6 +172,11 @@ def to_hf_dataset(self) -> HFDataset:
 
         return HFDataset.from_list(self._to_list())
 
+    @classmethod
+    def from_hf_dataset(cls, dataset: HFDataset) -> "RagasDataset[Sample]":
+        """Creates an EvaluationDataset from a Hugging Face Dataset."""
+        return cls.from_list(dataset.to_list())
+
     def to_pandas(self) -> PandasDataframe:
         """Converts the dataset to a pandas DataFrame."""
         try:
@@ -244,11 +219,80 @@ def from_dict(cls, mapping: t.Dict):
             samples.extend(SingleTurnSample(**sample) for sample in mapping)
         return cls(samples=samples)
 
-    def __iter__(self) -> t.Iterator[BaseEvalSample]:  # type: ignore
+    @classmethod
+    def from_csv(cls, path: str):
+        """Creates an EvaluationDataset from a CSV file."""
+        import csv
+
+        with open(path, "r", newline="") as csvfile:
+            reader = csv.DictReader(csvfile)
+            data = [row for row in reader]
+        return cls.from_list(data)
+
+    def to_csv(self, path: str):
+        """Converts the dataset to a CSV file."""
+        import csv
+
+        data = self._to_list()
+        if not data:
+            return
+
+        fieldnames = self.features()
+
+        with open(path, "w", newline="") as csvfile:
+            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
+            writer.writeheader()
+            for row in data:
+                writer.writerow(row)
+
+    def to_jsonl(self, path: str):
+        """Converts the dataset to a JSONL file."""
+        with open(path, "w") as jsonlfile:
+            for sample in self.samples:
+                jsonlfile.write(json.dumps(sample.to_dict()) + "\n")
+
+    @classmethod
+    def from_jsonl(cls, path: str):
+        """Creates an EvaluationDataset from a JSONL file."""
+        with open(path, "r") as jsonlfile:
+            data = [json.loads(line) for line in jsonlfile]
+        return cls.from_list(data)
+
+    def __iter__(self) -> t.Iterator[Sample]:  # type: ignore
         return iter(self.samples)
 
     def __len__(self) -> int:
         return len(self.samples)
 
-    def __getitem__(self, idx: int) -> BaseEvalSample:
+    def __getitem__(self, idx: int) -> Sample:
         return self.samples[idx]
+
+
+class EvaluationDataset(RagasDataset[BaseSample]):
+    """
+    Represents a dataset of evaluation samples.
+
+    Attributes
+    ----------
+    samples : List[BaseSample]
+        A list of evaluation samples.
+
+    Methods
+    -------
+    validate_samples(samples)
+        Validates that all samples are of the same type.
+    get_sample_type()
+        Returns the type of the samples in the dataset.
+    to_hf_dataset()
+        Converts the dataset to a Hugging Face Dataset.
+    to_pandas()
+        Converts the dataset to a pandas DataFrame.
+    features()
+        Returns the features of the samples.
+    from_list(mapping)
+        Creates an EvaluationDataset from a list of dictionaries.
+    from_dict(mapping)
+        Creates an EvaluationDataset from a dictionary.
+    """
+
+    pass
@@ -160,7 +160,9 @@ def format(self, **kwargs: t.Any) -> PromptValue:
             )
         for key, value in kwargs.items():
             if isinstance(value, str):
-                kwargs[key] = json.dumps(value, ensure_ascii=False).encode("utf8").decode()
+                kwargs[key] = (
+                    json.dumps(value, ensure_ascii=False).encode("utf8").decode()
+                )
 
         prompt = self.to_string()
         return PromptValue(prompt_str=prompt.format(**kwargs))