Add required data chapter in rag_evaluation.md, fix typo in task.md, add rag evaluation to nav

andreacerasani · andreacerasani · commit bf70aa8be9ae · 2024-11-04T12:29:48.000+01:00
diff --git a/docs/imgs/rag_evaluation.png b/docs/imgs/rag_evaluation.png
diff --git a/md-docs/user_guide/modules/rag_evaluation.md b/md-docs/user_guide/modules/rag_evaluation.md
@@ -3,22 +3,28 @@
 ## What is RAG Evaluation?
 
 RAG (Retrieval-Augmented Generation) is a way of building AI models that enhances their ability to generate accurate and contextually relevant responses by combining two main steps: **retrieval** and **generation**.
+
 1. **Retrieval**: The model first searches through a large set of documents or pieces of information to "retrieve" the most relevant ones based on the user query.
 2. **Generation**: It then uses these retrieved documents as context to generate a response, which is typically more accurate and aligned with the question than if it had generated text from scratch without specific guidance.
 
 Evaluating RAG involves assessing how well the model does in both retrieval and generation.
 
 Our RAG evaluation module analyzes the three main components of a RAG framework:
+
 - **User Input**: The query or question posed by the user.
-- **Context**: The retrieved documents or information that the model uses to generate a response.
+- **Context**: The retrieved documents or information that the model uses to generate a response. A context can consist of one or more chunks of text.
 - **Response**: The generated answer or output provided by the model.
 
 In particular, the analysis is performed on the relationships between these components:
+
 - **User Input - Context**: Retrieval Evaluation
 - **Context - Response**: Context Factual Correctness
 - **User Input - Response**: Response Evaluation
 
-![ML cube Platform RAG Evaluation](../../imgs/rag_evaluation.png)
+<figure markdown>
+  ![ML cube Platform RAG Evaluation](../../imgs/rag_evaluation.png){ width="600"}
+  <figcaption>ML cube Platform RAG Evaluation</figcaption>
+</figure>
 
 The evaluation is performed through an LLM-as-a-Judge approach, where a Language Model (LM) acts as a judge to evaluate the quality of a RAG model.
 
@@ -37,4 +43,14 @@ Below are the metrics computed by the RAG evaluation module, divided into the th
 - **Faithfulness**: Measures how much the response contradicts the retrieved context. A higher faithfulness score indicates that the response is more aligned with the context. The score ranges from 1 to 5, with 5 being the highest faithfulness.
 
 ### User Input - Response
-- **Satisfaction**: Evaluates how satisfied the user would be with the generated response. The score ranges from 1 to 5, with 1 a response that does not address the user query and 5 a response that fully addresses and answers the user query.
+- **Satisfaction**: Evaluates how satisfied the user would be with the generated response. The score ranges from 1 to 5, with 1 a response that does not address the user query and 5 a response that fully addresses and answers the user query.
+
+## What is the required data?
+
+The RAG evaluation module computes the metrics based on the data availability for each sample. 
+If a sample lacks one of the three components (User Input, Context or Response), only the applicable metrics are computed. 
+For instance, if a sample does not have a response, only the **User Input - Context** metrics are computed.
+
+If data added to a [Task] contains contexts with multiple chunks of text, a [context separator](../task.md#retrieval-augmented-generation) must be provided.
+
+[Task]: ../task.md
diff --git a/md-docs/user_guide/task.md b/md-docs/user_guide/task.md
@@ -50,7 +50,7 @@ Indeed, each Task Type has a set of ML cube Platform modules:
 | LLM Security | :material-close: | :material-close: | :material-check: | :material-close: |
 
 !!! Tip
-    On the left side of the web app page the Task menù is present, with links to the above mentioned modules and Task settings.
+    On the left side of the web app page the Task menu is present, with links to the above mentioned modules and Task settings.
 
 ## Task Type
 
@@ -140,9 +140,9 @@ Moreover, in this Task, the Prediction is a text as well and the input is compos
 RAG tasks has additional the attribute *context separator* which is a string used to separate different retrieved contexts into chunks. Context data is sent as a single string, however, in RAG settings multiple documents can be retrieved. In this case, context separator is used to distinguish them. It is optional since a single context can be provided.
 
 !!! example
-    Context separator: <<sep>>
+    Context separator: <<sep\>\>
     
-    Context data: The capital of Italy is Rome.<<sep>>Rome is the capital of Italy.<<sep>>Rome was the capital of Roman Empire.
+    Context data: The capital of Italy is Rome.<<sep\>\>Rome is the capital of Italy.<<sep\>\>Rome was the capital of Roman Empire.
 
     Contexts:
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -115,6 +115,7 @@ nav:
           - user_guide/modules/index.md
           - user_guide/modules/monitoring.md
           - user_guide/modules/retraining.md
+          - user_guide/modules/rag_evaluation.md
           - user_guide/modules/business.md
           - user_guide/modules/labeling.md
       - Integrations: