Skip to content

Commit 5046604

Browse files
committed
docs: update AnswerCorrectness to collections-based API
- Add collections-based API example with LLM and embeddings setup - Move legacy evaluate() example to Legacy Metrics API section - Add deprecation warning for legacy API - Include synchronous usage note - Tested example code and verified it works correctly Note: Embeddings are required because default weights [0.75, 0.25] include semantic similarity. See issue #2408 for making embeddings optional when weights[1] == 0.
1 parent 5179ed0 commit 5046604

File tree

1 file changed

+58
-11
lines changed

1 file changed

+58
-11
lines changed

docs/concepts/metrics/available_metrics/answer_correctness.md

Lines changed: 58 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,44 @@ Answer correctness encompasses two critical aspects: semantic similarity between
1616
### Example
1717

1818
```python
19-
from datasets import Dataset
20-
from ragas.metrics import answer_correctness
21-
from ragas import evaluate
19+
from openai import AsyncOpenAI
20+
from ragas.llms import llm_factory
21+
from ragas.embeddings.base import embedding_factory
22+
from ragas.metrics.collections import AnswerCorrectness
23+
24+
# Setup LLM and embeddings
25+
client = AsyncOpenAI()
26+
llm = llm_factory("gpt-4o-mini", client=client)
27+
embeddings = embedding_factory("openai", model="text-embedding-3-small", client=client)
28+
29+
# Create metric
30+
scorer = AnswerCorrectness(llm=llm, embeddings=embeddings)
31+
32+
# Evaluate
33+
result = await scorer.ascore(
34+
user_input="When was the first super bowl?",
35+
response="The first superbowl was held on Jan 15, 1967",
36+
reference="The first superbowl was held on January 15, 1967"
37+
)
38+
print(f"Answer Correctness Score: {result.value}")
39+
```
2240

23-
data_samples = {
24-
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
25-
'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
26-
'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
27-
}
28-
dataset = Dataset.from_dict(data_samples)
29-
score = evaluate(dataset,metrics=[answer_correctness])
30-
score.to_pandas()
41+
Output:
3142

3243
```
44+
Answer Correctness Score: 0.95
45+
```
46+
47+
!!! note "Synchronous Usage"
48+
If you prefer synchronous code, you can use the `.score()` method instead of `.ascore()`:
49+
50+
```python
51+
result = scorer.score(
52+
user_input="When was the first super bowl?",
53+
response="The first superbowl was held on Jan 15, 1967",
54+
reference="The first superbowl was held on January 15, 1967"
55+
)
56+
```
3357

3458
### Calculation
3559

@@ -57,3 +81,26 @@ Next, we calculate the semantic similarity between the generated answer and the
5781

5882
Once we have the semantic similarity, we take a weighted average of the semantic similarity and the factual similarity calculated above to arrive at the final score. You can adjust this weightage by modifying the `weights` parameter.
5983

84+
## Legacy Metrics API
85+
86+
The following examples use the legacy metrics API pattern. For new projects, we recommend using the collections-based API shown above.
87+
88+
!!! warning "Deprecation Timeline"
89+
This API will be deprecated in version 0.4 and removed in version 1.0. Please migrate to the collections-based API shown above.
90+
91+
### Example with Dataset
92+
93+
```python
94+
from datasets import Dataset
95+
from ragas.metrics import answer_correctness
96+
from ragas import evaluate
97+
98+
data_samples = {
99+
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
100+
'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
101+
'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
102+
}
103+
dataset = Dataset.from_dict(data_samples)
104+
score = evaluate(dataset,metrics=[answer_correctness])
105+
score.to_pandas()
106+
```

0 commit comments

Comments
 (0)