Skip to content

Commit 4a2804e

Browse files
authored
New Answer relevancy metrics (#77)
## What Added a new answer relevancy method that can use any LLM ## Why The earlier method was using the t5 model which was giving poor results with issues like multilingualism. The process was also slow due to the local model. ## How Introduced a new paradigm that uses a combination of question generation and self-consistency. ## Results <img width="904" alt="Screenshot 2023-07-28 at 8 01 31 PM" src="https://github.com/explodinggradients/ragas/assets/25312635/2cc105ee-751e-4716-b2aa-08bd491ef60b">
1 parent bbe53b9 commit 4a2804e

File tree

4 files changed

+465
-207
lines changed

4 files changed

+465
-207
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,11 @@ If you want a more in-depth explanation of core components, check out our [quick
8282

8383
Ragas measures your pipeline's performance against different dimensions
8484
1. **Faithfulness**: measures the information consistency of the generated answer against the given context. If any claims made in the answer that cannot be deduced from context is penalized.
85+
8586
2. **Context Relevancy**: measures how relevant retrieved contexts is to the question. Ideally the context should only contain information necessary to answer the question. The presence of redundant information in the context is penalized.
86-
3. **Answer Relevancy**: measures how relevant generated answer is to the question. This do not ensure factuality of the generated answer rather penalizes the presence of redundant information in the generated answer.
87+
88+
3. **Answer Relevancy**: refers to the degree to which a response directly addresses and is appropriate for a given question or context. This does not take factuality of answer into consideration but rather penalize the present of redundant information or incomplete answers given a question.
89+
8790
4. **Aspect Critiques**: Designed to judge the submission against defined aspects like harmlessness, correctness, etc. You can also define your own aspect and validate the submission against your desired aspect. The output of aspect critiques is always binary.
8891

8992
The final `ragas_score` is the harmonic mean of of individual metric scores.

docs/metrics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ dataset: Dataset
2828
results = context_rel.score(dataset)
2929
```
3030

31-
3. `answer_relevancy`: measures how relevant is the generated answer to the prompt. This is quantified using conditional likelihood of an LLM generating the question given the answer. This is implemented using a custom model. Values range (0,1), higher the better.
31+
3. `answer_relevancy`: measures how relevant is the generated answer to the prompt. If the generated answer is incomplete or contains redundant information the score will be low. This is quantified by working out the chance of an LLM generating the given question using the generated answer. Values range (0,1), higher the better.
3232
```python
3333
from ragas.metrics.answer_relevancy import AnswerRelevancy
3434
answer_relevancy = AnswerRelevancy(model_name="t5-small")

0 commit comments

Comments
 (0)