|
1 | | -# Dataset preparation for Evaluating AI Systems |
| 1 | +# Datasets and Experiment Results |
| 2 | + |
| 3 | +When we evaluate AI systems, we typically work with two main types of data: |
| 4 | + |
| 5 | +1. **Evaluation Datasets**: These are stored under the `datasets` directory. |
| 6 | +2. **Evaluation Results**: These are stored under the `experiments` directory. |
| 7 | + |
| 8 | +## Evaluation Datasets |
| 9 | + |
| 10 | +A dataset for evaluations contains: |
| 11 | + |
| 12 | +1. Inputs: a set of inputs that the system will process. |
| 13 | +2. Expected outputs (Optional): the expected outputs or responses from the system for the given inputs. |
| 14 | +3. Metadata (Optional): additional information that can be stored alongside the dataset. |
| 15 | + |
| 16 | +For example, in a Retrieval-Augmented Generation (RAG) system it might include query (input to the system), Grading notes (to grade the output from the system), and metadata like query complexity. |
| 17 | + |
| 18 | +Metadata is particularly useful for slicing and dicing the dataset, allowing you to analyze results across different facets. For instance, you might want to see how your system performs on complex queries versus simple ones, or how it handles different languages. |
| 19 | + |
| 20 | +## Experiment Results |
| 21 | + |
| 22 | +Experiment results include: |
| 23 | + |
| 24 | +1. All attributes from the dataset. |
| 25 | +2. The response from the evaluated system. |
| 26 | +3. Results of metrics. |
| 27 | +4. Optional metadata, such as a URI pointing to the system trace for a given input. |
| 28 | + |
| 29 | +For example, in a RAG system, the results might include Query, Grading notes, Response, Accuracy score (metric), link to the system trace, etc. |
| 30 | + |
| 31 | +## Data Storage in Ragas |
| 32 | + |
| 33 | +We understand that different teams have diverse preferences for organizing, updating, and maintaining data, for example: |
| 34 | + |
| 35 | +- A single developer might store datasets as CSV files in the local filesystem. |
| 36 | +- A small-to-medium team might use Google Sheets or Notion databases. |
| 37 | +- Enterprise teams might rely on Box or Microsoft OneDrive, depending on their data storage and sharing policies. |
| 38 | + |
| 39 | +Teams may also use various file formats like CSV, XLSX, or JSON. Among these, CSV or spreadsheet formats are often preferred for evaluation datasets due to their simplicity and smaller size compared to training datasets. |
| 40 | + |
| 41 | +Ragas, as an evaluation framework, supports these diverse preferences by enabling you to use your preferred file systems and formats for storing and reading datasets and experiment results. |
| 42 | + |
| 43 | +To achieve this, Ragas introduces the concept of **plug-and-play backends** for data storage: |
| 44 | + |
| 45 | +- Ragas provides default backends like `local/csv` and `google_drive/csv`. |
| 46 | +- These backends are extensible, allowing you to implement custom backends for any file system or format (e.g., `box/csv`). |
| 47 | + |
| 48 | + |
| 49 | +## Using Datasets and Results via API |
| 50 | + |
| 51 | +### Loading a Dataset |
| 52 | + |
| 53 | +```python |
| 54 | +from ragas_experimental import Dataset |
| 55 | + |
| 56 | +test_dataset = Dataset.load(name="test_dataset", backend="local/csv", root_dir=".") |
| 57 | +``` |
| 58 | + |
| 59 | +This command loads a dataset named `test_dataset.csv` from the `root_directory/datasets` directory. The backend can be any backend registered via Ragas backends. |
| 60 | + |
| 61 | +### Loading Experiment Results |
| 62 | + |
| 63 | +```python |
| 64 | +from ragas_experimental import Experiment |
| 65 | + |
| 66 | +experiment_results = Experiment.load(name="first_experiment", backend="local/csv", root_dir=".") |
| 67 | +``` |
| 68 | + |
| 69 | +This command loads experiment results named `first_experiment.csv` from the `root_directory/experiments` directory. The backend can be any backend registered via Ragas backends. |
| 70 | + |
| 71 | +## Data Validation Using Pydantic |
| 72 | + |
| 73 | +Ragas provides data type validation via Pydantic. You can configure a preferred `data_model` for a dataset or experiment results to ensure data is validated before reading or writing to the data storage. |
| 74 | + |
| 75 | +**Example**: |
| 76 | + |
| 77 | +```python |
| 78 | +from ragas_experimental import Dataset |
| 79 | +from pydantic import BaseModel |
| 80 | + |
| 81 | +class MyDataset(BaseModel): |
| 82 | + query: str |
| 83 | + ground_truth: str |
| 84 | + |
| 85 | +test_dataset = Dataset.load(name="test_dataset", backend="local/csv", root_dir=".", data_model=MyDataset) |
| 86 | +``` |
| 87 | + |
| 88 | +This ensures that the data meets the specified type requirements, preventing invalid data from being read or written. |
0 commit comments