You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -37,7 +37,7 @@ ragas is a framework that helps you evaluate your Retrieval Augmented Generation
37
37
38
38
ragas provides you with the tools based on the latest research for evaluating LLM generated text to give you insights about your RAG pipeline. ragas can be integrated with your CI/CD to provide continuous check to ensure performance.
39
39
40
-
## 🛡 Installation
40
+
## :beers: Installation
41
41
42
42
```bash
43
43
pip install ragas
@@ -48,7 +48,7 @@ git clone https://github.com/explodinggradients/ragas && cd ragas
48
48
pip install -e .
49
49
```
50
50
51
-
## 🔥 Quickstart
51
+
## :beers: Quickstart
52
52
53
53
This is a small example program you can run to see ragas in action!
If you want a more in-depth explanation of core components, check out our [quick-start notebook](./examples/quickstart.ipynb)
79
-
## 🧰 Metrics
78
+
If you want a more in-depth explanation of core components, check out our quick-start notebook
79
+
## :beers: Metrics
80
80
81
81
### ✏️ Character based
82
82
83
-
Character based metrics focus on analyzing text at the character level.
84
-
85
83
-**Levenshtein distance** the number of single character edits (additional, insertion, deletion) required to change your generated text to ground truth text.
86
84
-**Levenshtein****ratio** is obtained by dividing the Levenshtein distance by sum of number of characters in generated text and ground truth. This type of metrics is suitable where one works with short and precise texts.
87
85
@@ -113,7 +111,7 @@ Model based methods uses language models combined with NLP techniques to compare
113
111
114
112
-**$Q^2$**
115
113
116
-
Best used to measure factual consistencies between ground truth and generated text. Scores can range from 0 to 1. Higher score indicates better factual consistency between ground truth and generated answer. Employs QA-QG paradigm followed by NLI to compare ground truth and generated answer. Q2Score score is highly correlated with human judgement.
114
+
Best used to measure factual consistencies between ground truth and generated text. Scores can range from 0 to 1. Higher score indicates better factual consistency between ground truth and generated answer. Employs QA-QG paradigm followed by NLI to compare ground truth and generated answer. $Q^2$ score is highly correlated with human judgement.
117
115
118
116
📜 Checkout [citations](./citations.md) for related publications.
0 commit comments