docs: update AnswerCorrectness to collections-based API

sanjeed5 · sanjeed5 · commit 5046604026f8 · 2025-11-07T17:54:10.000+05:30
- Add collections-based API example with LLM and embeddings setup - Move legacy evaluate() example to Legacy Metrics API section - Add deprecation warning for legacy API - Include synchronous usage note - Tested example code and verified it works correctly Note: Embeddings are required because default weights [0.75, 0.25] include semantic similarity. See issue #2408 for making embeddings optional when weights[1] == 0.
diff --git a/docs/concepts/metrics/available_metrics/answer_correctness.md b/docs/concepts/metrics/available_metrics/answer_correctness.md
@@ -16,20 +16,44 @@ Answer correctness encompasses two critical aspects: semantic similarity between
 ### Example
 
 ```python
-from datasets import Dataset 
-from ragas.metrics import answer_correctness
-from ragas import evaluate
+from openai import AsyncOpenAI
+from ragas.llms import llm_factory
+from ragas.embeddings.base import embedding_factory
+from ragas.metrics.collections import AnswerCorrectness
+
+# Setup LLM and embeddings
+client = AsyncOpenAI()
+llm = llm_factory("gpt-4o-mini", client=client)
+embeddings = embedding_factory("openai", model="text-embedding-3-small", client=client)
+
+# Create metric
+scorer = AnswerCorrectness(llm=llm, embeddings=embeddings)
+
+# Evaluate
+result = await scorer.ascore(
+    user_input="When was the first super bowl?",
+    response="The first superbowl was held on Jan 15, 1967",
+    reference="The first superbowl was held on January 15, 1967"
+)
+print(f"Answer Correctness Score: {result.value}")
+```
 
-data_samples = {
-    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
-    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
-    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
-}
-dataset = Dataset.from_dict(data_samples)
-score = evaluate(dataset,metrics=[answer_correctness])
-score.to_pandas()
+Output:
 
 ```
+Answer Correctness Score: 0.95
+```
+
+!!! note "Synchronous Usage"
+    If you prefer synchronous code, you can use the `.score()` method instead of `.ascore()`:
+    
+    ```python
+    result = scorer.score(
+        user_input="When was the first super bowl?",
+        response="The first superbowl was held on Jan 15, 1967",
+        reference="The first superbowl was held on January 15, 1967"
+    )
+    ```
 
 ### Calculation
 
@@ -57,3 +81,26 @@ Next, we calculate the semantic similarity between the generated answer and the
 
 Once we have the semantic similarity, we take a weighted average of the semantic similarity and the factual similarity calculated above to arrive at the final score. You can adjust this weightage by modifying the `weights` parameter.
 
+## Legacy Metrics API
+
+The following examples use the legacy metrics API pattern. For new projects, we recommend using the collections-based API shown above.
+
+!!! warning "Deprecation Timeline"
+    This API will be deprecated in version 0.4 and removed in version 1.0. Please migrate to the collections-based API shown above.
+
+### Example with Dataset
+
+```python
+from datasets import Dataset 
+from ragas.metrics import answer_correctness
+from ragas import evaluate
+
+data_samples = {
+    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
+    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
+    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
+}
+dataset = Dataset.from_dict(data_samples)
+score = evaluate(dataset,metrics=[answer_correctness])
+score.to_pandas()
+```