You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-6Lines changed: 20 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,8 +32,22 @@
32
32
<p>
33
33
</h4>
34
34
35
+
ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard.. This is were ragas (RAG Assessment) comes in
35
36
36
-
## Quickstart
37
+
ragas provides you with the tools based on the latest research for evaluating LLM generated text to give you insights about your RAG pipeline. ragas can be integrated with your CI/CD to provide continuous check to ensure performance.
If you want a more in-depth explanation of core components, check out our quick-start notebook
64
-
## Metrics
78
+
## 🧰 Metrics
65
79
66
-
### Character based
80
+
### ✏️ Character based
67
81
68
82
-**Levenshtein distance** the number of single character edits (additional, insertion, deletion) required to change your generated text to ground truth text.
69
83
-**Levenshtein****ratio** is obtained by dividing the Levenshtein distance by sum of number of characters in generated text and ground truth. This type of metrics is suitable where one works with short and precise texts.
70
84
71
-
### N-Gram based
85
+
### 🖊 N-Gram based
72
86
73
87
N-gram based metrics as name indicates uses n-grams for comparing generated answer with ground truth. It is suitable to extractive and abstractive tasks but has its limitations in long free form answers due to the word based comparison.
74
88
@@ -80,7 +94,7 @@ N-gram based metrics as name indicates uses n-grams for comparing generated answ
80
94
81
95
It measures precision by comparing clipped n-grams in generated text to ground truth text. These matches do not consider the ordering of words.
82
96
83
-
### Model Based
97
+
### 🪄 Model Based
84
98
85
99
Model based methods uses language models combined with NLP techniques to compare generated text with ground truth. It is well suited for free form long or short answer types.
86
100
@@ -98,5 +112,5 @@ Model based methods uses language models combined with NLP techniques to compare
98
112
99
113
Best used to measure factual consistencies between ground truth and generated text. Scores can range from 0 to 1. Higher score indicates better factual consistency between ground truth and generated answer. Employs QA-QG paradigm followed by NLI to compare ground truth and generated answer. Q2Score score is highly correlated with human judgement.
100
114
101
-
Checkout [citations](./citations.md) for related publications.
115
+
📜 Checkout [citations](./citations.md) for related publications.
0 commit comments