Skip to content

Commit 78bfab1

Browse files
committed
docs: update FactualCorrectness to collections-based API
1 parent b9d1f47 commit 78bfab1

File tree

1 file changed

+125
-30
lines changed

1 file changed

+125
-30
lines changed

docs/concepts/metrics/available_metrics/factual_correctness.md

Lines changed: 125 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,76 @@
22

33
`FactualCorrectness` is a metric that compares and evaluates the factual accuracy of the generated `response` with the `reference`. This metric is used to determine the extent to which the generated response aligns with the reference. The factual correctness score ranges from 0 to 1, with higher values indicating better performance. To measure the alignment between the response and the reference, the metric uses the LLM to first break down the response and reference into claims and then uses natural language inference to determine the factual overlap between the response and the reference. Factual overlap is quantified using precision, recall, and F1 score, which can be controlled using the `mode` parameter.
44

5+
### Example
6+
7+
```python
8+
from openai import AsyncOpenAI
9+
from ragas.llms import llm_factory
10+
from ragas.metrics.collections import FactualCorrectness
11+
12+
# Setup LLM
13+
client = AsyncOpenAI()
14+
llm = llm_factory("gpt-4o-mini", client=client)
15+
16+
# Create metric
17+
scorer = FactualCorrectness(llm=llm)
18+
19+
# Evaluate
20+
result = await scorer.ascore(
21+
response="The Eiffel Tower is located in Paris.",
22+
reference="The Eiffel Tower is located in Paris. It has a height of 1000ft."
23+
)
24+
print(f"Factual Correctness Score: {result.value}")
25+
```
26+
27+
Output:
28+
29+
```
30+
Factual Correctness Score: 0.67
31+
```
32+
33+
By default, the mode is set to `f1`. You can change the mode to `precision` or `recall` by setting the `mode` parameter:
34+
35+
```python
36+
# Precision mode - measures what fraction of response claims are supported by reference
37+
scorer = FactualCorrectness(llm=llm, mode="precision")
38+
result = await scorer.ascore(
39+
response="The Eiffel Tower is located in Paris.",
40+
reference="The Eiffel Tower is located in Paris. It has a height of 1000ft."
41+
)
42+
print(f"Precision Score: {result.value}")
43+
```
44+
45+
Output:
46+
47+
```
48+
Precision Score: 1.0
49+
```
50+
51+
You can also configure the claim decomposition granularity using `atomicity` and `coverage` parameters:
52+
53+
```python
54+
# High granularity - more detailed claim decomposition
55+
scorer = FactualCorrectness(
56+
llm=llm,
57+
mode="f1",
58+
atomicity="high", # More atomic claims
59+
coverage="high" # Comprehensive coverage
60+
)
61+
```
62+
63+
!!! note "Synchronous Usage"
64+
If you prefer synchronous code, you can use the `.score()` method instead of `.ascore()`:
65+
66+
```python
67+
result = scorer.score(
68+
response="The Eiffel Tower is located in Paris.",
69+
reference="The Eiffel Tower is located in Paris. It has a height of 1000ft."
70+
)
71+
```
72+
73+
### How It's Calculated
74+
575
The formula for calculating True Positive (TP), False Positive (FP), and False Negative (FN) is as follows:
676

777
$$
@@ -30,36 +100,6 @@ $$
30100
\text{F1 Score} = {2 \times \text{Precision} \times \text{Recall} \over (\text{Precision} + \text{Recall})}
31101
$$
32102

33-
### Example
34-
35-
```python
36-
from ragas.dataset_schema import SingleTurnSample
37-
from ragas.metrics._factual_correctness import FactualCorrectness
38-
39-
40-
sample = SingleTurnSample(
41-
response="The Eiffel Tower is located in Paris.",
42-
reference="The Eiffel Tower is located in Paris. I has a height of 1000ft."
43-
)
44-
45-
scorer = FactualCorrectness(llm = evaluator_llm)
46-
await scorer.single_turn_ascore(sample)
47-
```
48-
Output
49-
```
50-
0.67
51-
```
52-
53-
By default, the mode is set to `F1`, you can change the mode to `precision` or `recall` by setting the `mode` parameter.
54-
55-
```python
56-
scorer = FactualCorrectness(llm = evaluator_llm, mode="precision")
57-
```
58-
Output
59-
```
60-
1.0
61-
```
62-
63103
### Controlling the Number of Claims
64104

65105
Each sentence in the response and reference can be broken down into one or more claims. The number of claims that are generated from a single sentence is determined by the level of `atomicity` and `coverage` required for your application.
@@ -161,3 +201,58 @@ By adjusting both atomicity and coverage, you can customize the level of detail
161201
- Use **Low Atomicity and Low Coverage** when only the key information is necessary, such as for summarization.
162202

163203
This flexibility in controlling the number of claims helps ensure that the information is presented at the right level of granularity for your application's requirements.
204+
205+
## Legacy Metrics API
206+
207+
The following examples use the legacy metrics API pattern. For new projects, we recommend using the collections-based API shown above.
208+
209+
!!! warning "Deprecation Timeline"
210+
This API will be deprecated in version 0.4 and removed in version 1.0. Please migrate to the collections-based API shown above.
211+
212+
### Example with SingleTurnSample
213+
214+
```python
215+
from ragas.dataset_schema import SingleTurnSample
216+
from ragas.metrics._factual_correctness import FactualCorrectness
217+
218+
219+
sample = SingleTurnSample(
220+
response="The Eiffel Tower is located in Paris.",
221+
reference="The Eiffel Tower is located in Paris. I has a height of 1000ft."
222+
)
223+
224+
scorer = FactualCorrectness(llm = evaluator_llm)
225+
await scorer.single_turn_ascore(sample)
226+
```
227+
228+
Output:
229+
230+
```
231+
0.67
232+
```
233+
234+
### Changing the Mode
235+
236+
By default, the mode is set to `F1`, you can change the mode to `precision` or `recall` by setting the `mode` parameter.
237+
238+
```python
239+
scorer = FactualCorrectness(llm = evaluator_llm, mode="precision")
240+
```
241+
242+
Output:
243+
244+
```
245+
1.0
246+
```
247+
248+
### Controlling Atomicity
249+
250+
```python
251+
scorer = FactualCorrectness(mode="precision", atomicity="low")
252+
```
253+
254+
Output:
255+
256+
```
257+
1.0
258+
```

0 commit comments

Comments
 (0)