Skip to content

Commit bf70aa8

Browse files
Add required data chapter in rag_evaluation.md, fix typo in task.md, add rag evaluation to nav
1 parent f17e036 commit bf70aa8

File tree

4 files changed

+23
-6
lines changed

4 files changed

+23
-6
lines changed

docs/imgs/rag_evaluation.png

1.46 MB
Loading

md-docs/user_guide/modules/rag_evaluation.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,28 @@
33
## What is RAG Evaluation?
44

55
RAG (Retrieval-Augmented Generation) is a way of building AI models that enhances their ability to generate accurate and contextually relevant responses by combining two main steps: **retrieval** and **generation**.
6+
67
1. **Retrieval**: The model first searches through a large set of documents or pieces of information to "retrieve" the most relevant ones based on the user query.
78
2. **Generation**: It then uses these retrieved documents as context to generate a response, which is typically more accurate and aligned with the question than if it had generated text from scratch without specific guidance.
89

910
Evaluating RAG involves assessing how well the model does in both retrieval and generation.
1011

1112
Our RAG evaluation module analyzes the three main components of a RAG framework:
13+
1214
- **User Input**: The query or question posed by the user.
13-
- **Context**: The retrieved documents or information that the model uses to generate a response.
15+
- **Context**: The retrieved documents or information that the model uses to generate a response. A context can consist of one or more chunks of text.
1416
- **Response**: The generated answer or output provided by the model.
1517

1618
In particular, the analysis is performed on the relationships between these components:
19+
1720
- **User Input - Context**: Retrieval Evaluation
1821
- **Context - Response**: Context Factual Correctness
1922
- **User Input - Response**: Response Evaluation
2023

21-
![ML cube Platform RAG Evaluation](../../imgs/rag_evaluation.png)
24+
<figure markdown>
25+
![ML cube Platform RAG Evaluation](../../imgs/rag_evaluation.png){ width="600"}
26+
<figcaption>ML cube Platform RAG Evaluation</figcaption>
27+
</figure>
2228

2329
The evaluation is performed through an LLM-as-a-Judge approach, where a Language Model (LM) acts as a judge to evaluate the quality of a RAG model.
2430

@@ -37,4 +43,14 @@ Below are the metrics computed by the RAG evaluation module, divided into the th
3743
- **Faithfulness**: Measures how much the response contradicts the retrieved context. A higher faithfulness score indicates that the response is more aligned with the context. The score ranges from 1 to 5, with 5 being the highest faithfulness.
3844

3945
### User Input - Response
40-
- **Satisfaction**: Evaluates how satisfied the user would be with the generated response. The score ranges from 1 to 5, with 1 a response that does not address the user query and 5 a response that fully addresses and answers the user query.
46+
- **Satisfaction**: Evaluates how satisfied the user would be with the generated response. The score ranges from 1 to 5, with 1 a response that does not address the user query and 5 a response that fully addresses and answers the user query.
47+
48+
## What is the required data?
49+
50+
The RAG evaluation module computes the metrics based on the data availability for each sample.
51+
If a sample lacks one of the three components (User Input, Context or Response), only the applicable metrics are computed.
52+
For instance, if a sample does not have a response, only the **User Input - Context** metrics are computed.
53+
54+
If data added to a [Task] contains contexts with multiple chunks of text, a [context separator](../task.md#retrieval-augmented-generation) must be provided.
55+
56+
[Task]: ../task.md

md-docs/user_guide/task.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Indeed, each Task Type has a set of ML cube Platform modules:
5050
| LLM Security | :material-close: | :material-close: | :material-check: | :material-close: |
5151

5252
!!! Tip
53-
On the left side of the web app page the Task menù is present, with links to the above mentioned modules and Task settings.
53+
On the left side of the web app page the Task menu is present, with links to the above mentioned modules and Task settings.
5454

5555
## Task Type
5656

@@ -140,9 +140,9 @@ Moreover, in this Task, the Prediction is a text as well and the input is compos
140140
RAG tasks has additional the attribute *context separator* which is a string used to separate different retrieved contexts into chunks. Context data is sent as a single string, however, in RAG settings multiple documents can be retrieved. In this case, context separator is used to distinguish them. It is optional since a single context can be provided.
141141

142142
!!! example
143-
Context separator: <<sep>>
143+
Context separator: <<sep\>\>
144144

145-
Context data: The capital of Italy is Rome.<<sep>>Rome is the capital of Italy.<<sep>>Rome was the capital of Roman Empire.
145+
Context data: The capital of Italy is Rome.<<sep\>\>Rome is the capital of Italy.<<sep\>\>Rome was the capital of Roman Empire.
146146

147147
Contexts:
148148

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ nav:
115115
- user_guide/modules/index.md
116116
- user_guide/modules/monitoring.md
117117
- user_guide/modules/retraining.md
118+
- user_guide/modules/rag_evaluation.md
118119
- user_guide/modules/business.md
119120
- user_guide/modules/labeling.md
120121
- Integrations:

0 commit comments

Comments
 (0)