You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: notebooks/evaluation/README.md
+18-22Lines changed: 18 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,36 +10,32 @@ The primary goal is to offer a reproducible methodology for comparing RAG system
10
10
11
11
## Table of Contents
12
12
13
-
-[Project Structure](#project-structure)
14
-
-[Getting Started](#getting-started)
15
-
-[Summary of Findings](#summary-of-findings)
16
-
-[Detailed Results](#detailed-results)
17
-
-[Key Limitations of this Study](#key-limitations-of-this-study)
18
-
-[Further Observations](#further-observations)
13
+
-[Project Structure](#project-structure)
14
+
-[Getting Started](#getting-started)
15
+
-[Summary of Findings](#summary-of-findings)
16
+
-[Detailed Results](#detailed-results)
17
+
-[Key Limitations of this Study](#key-limitations-of-this-study)
18
+
-[Further Observations](#further-observations)
19
19
20
20
## Project Structure
21
21
22
22
This directory includes the following components:
23
23
24
-
***Jupyter Notebooks**:
24
+
***Jupyter Notebooks**:
25
+
*[`make-sample-questions.ipynb`](./make-sample-questions.ipynb): Generates a dataset of sample questions and reference answers from a source document.
26
+
*[`evaluate-using-sample-questions-lls-vs-li.ipynb`](./evaluate-using-sample-questions-lls-vs-li.ipynb): Runs Llama Stack and LlamaIndex RAG pipelines on the generated questions, evaluates their responses using the Ragas framework, and performs statistical significance testing with SciPy.
25
27
26
-
* [`make-sample-questions.ipynb`](./make-sample-questions.ipynb): Generates a dataset of sample questions and reference answers from a source document.
27
-
* [`evaluate-using-sample-questions-lls-vs-li.ipynb`](/evaluate-using-sample-questions-lls-vs-li.ipynb): Runs Llama Stack and LlamaIndex RAG pipelines on the generated questions, evaluates their responses using the Ragas framework, and performs statistical significance testing with SciPy.
28
+
***Supporting Code**:
29
+
*[`evaluation_utilities.py`](./evaluation_utilities.py): Utility functions and helper code for the evaluation notebooks.
28
30
29
-
***Supporting Code**:
31
+
***Sample Data**:
32
+
*[`qna-ibm-2024-2250-2239.json`](./qna-ibm-2024-2250-2239.json): A Q\&A dataset generated from the IBM 2024 annual report without special instructions.
33
+
*[`qna-ibm-2024b-2220-2196.json`](./qna-ibm-2024b-2220-2196.json): A Q\&A dataset generated from the same report, but using the default special instructions in the notebook to produce more diverse questions.
34
+
***Note on filenames**: The numbers in the JSON filenames (`{configured_questions}-{final_question_count}`) may not perfectly match the final counts in the file due to de-duplication steps.
30
35
31
-
*[`evaluation_utilities.py`](./evaluation_utilities.py): Utility functions and helper code for the evaluation notebooks.
32
-
33
-
***Sample Data**:
34
-
35
-
*[`qna-ibm-2024-2250-2239.json`](./qna-ibm-2024-2250-2239.json): A Q\&A dataset generated from the IBM 2024 annual report without special instructions.
36
-
*[`qna-ibm-2024b-2220-2196.json`](./qna-ibm-2024b-2220-2196.json): A Q\&A dataset generated from the same report, but using the default special instructions in the notebook to produce more diverse questions.
37
-
***Note on filenames**: The numbers in the JSON filenames (`{configured_questions}-{final_question_count}`) may not perfectly match the final counts in the file due to de-duplication steps.
38
-
39
-
***Configuration**:
40
-
41
-
*[`requirements.txt`](./requirements.txt): A list of Python libraries required to run the notebooks.
42
-
*[`run.yaml`](./run.yaml): A configuration file for the Llama Stack server.
36
+
***Configuration**:
37
+
*[`requirements.txt`](./requirements.txt): A list of Python libraries required to run the notebooks.
38
+
*[`run.yaml`](./run.yaml): A configuration file for the Llama Stack server.
0 commit comments