Skip to content

Commit 6168755

Browse files
committed
docs: update ContextRecall metric documentation to collections API
- Add primary example using ContextRecall from ragas.metrics.collections - Include synchronous usage note with .score() method - Move LLMContextRecall to legacy section with deprecation warning - Keep NonLLMContextRecall and IDBasedContextRecall as valid alternatives (no collections API equivalents) - Tested example and verified output
1 parent 78bfab1 commit 6168755

File tree

1 file changed

+48
-11
lines changed

1 file changed

+48
-11
lines changed

docs/concepts/metrics/available_metrics/context_recall.md

Lines changed: 48 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,59 @@
11
# Context Recall
22

3-
Context Recall measures how many of the relevant documents (or pieces of information) were successfully retrieved. It focuses on not missing important results. Higher recall means fewer relevant documents were left out.
4-
In short, recall is about not missing anything important. Since it is about not missing anything, calculating context recall always requires a reference to compare against.
5-
6-
7-
8-
## LLM Based Context Recall
9-
10-
`LLMContextRecall` is computed using `user_input`, `reference` and the `retrieved_contexts`, and the values range between 0 and 1, with higher values indicating better performance. This metric uses `reference` as a proxy to `reference_contexts` which also makes it easier to use as annotating reference contexts can be very time-consuming. To estimate context recall from the `reference`, the reference is broken down into claims each claim in the `reference` answer is analyzed to determine whether it can be attributed to the retrieved context or not. In an ideal scenario, all claims in the reference answer should be attributable to the retrieved context.
3+
Context Recall measures how many of the relevant documents (or pieces of information) were successfully retrieved. It focuses on not missing important results. Higher recall means fewer relevant documents were left out. In short, recall is about not missing anything important.
114

5+
Since it is about not missing anything, calculating context recall always requires a reference to compare against. The LLM-based Context Recall metric uses `reference` as a proxy to `reference_contexts`, which makes it easier to use as annotating reference contexts can be very time-consuming. To estimate context recall from the `reference`, the reference is broken down into claims, and each claim is analyzed to determine whether it can be attributed to the retrieved context or not. In an ideal scenario, all claims in the reference answer should be attributable to the retrieved context.
126

137
The formula for calculating context recall is as follows:
148

159
$$
1610
\text{Context Recall} = \frac{\text{Number of claims in the reference supported by the retrieved context}}{\text{Total number of claims in the reference}}
1711
$$
1812

19-
### Example
13+
## Example
14+
15+
```python
16+
from openai import AsyncOpenAI
17+
from ragas.llms import llm_factory
18+
from ragas.metrics.collections import ContextRecall
19+
20+
# Setup LLM
21+
client = AsyncOpenAI()
22+
llm = llm_factory("gpt-4o-mini", client=client)
23+
24+
# Create metric
25+
scorer = ContextRecall(llm=llm)
26+
27+
# Evaluate
28+
result = await scorer.ascore(
29+
user_input="Where is the Eiffel Tower located?",
30+
retrieved_contexts=["Paris is the capital of France."],
31+
reference="The Eiffel Tower is located in Paris."
32+
)
33+
print(f"Context Recall Score: {result.value}")
34+
```
35+
36+
Output:
37+
38+
```
39+
Context Recall Score: 1.0
40+
```
41+
42+
!!! note "Synchronous Usage"
43+
If you prefer synchronous code, you can use the `.score()` method instead of `.ascore()`:
44+
45+
```python
46+
result = scorer.score(
47+
user_input="Where is the Eiffel Tower located?",
48+
retrieved_contexts=["Paris is the capital of France."],
49+
reference="The Eiffel Tower is located in Paris."
50+
)
51+
```
52+
53+
## LLM Based Context Recall (Legacy API)
54+
55+
!!! warning "Legacy API"
56+
The following example uses the legacy metrics API pattern. For new projects, we recommend using the collections-based API shown above. This API will be deprecated in version 0.4 and removed in version 1.0.
2057

2158
```python
2259
from ragas.dataset_schema import SingleTurnSample
@@ -31,9 +68,9 @@ sample = SingleTurnSample(
3168

3269
context_recall = LLMContextRecall(llm=evaluator_llm)
3370
await context_recall.single_turn_ascore(sample)
34-
3571
```
36-
Output
72+
73+
Output:
3774
```
3875
1.0
3976
```

0 commit comments

Comments
 (0)