docs: update SummaryScore metric documentation to collections-based API

sanjeed5 · sanjeed5 · commit e967181a4dc3 · 2025-11-07T17:44:19.000+05:30
- Add new collections-based API example using SummaryScore from ragas.metrics.collections
- Include synchronous usage note with .score() method
- Add custom configuration example for length_penalty and coeff parameters
- Move legacy example to Legacy Metrics API section with deprecation warning
- Preserve all conceptual explanations and formulas
- Tested example code and verified it produces expected output
diff --git a/docs/concepts/metrics/available_metrics/summarization_score.md b/docs/concepts/metrics/available_metrics/summarization_score.md
@@ -2,7 +2,7 @@
 
 ## Summarization Score
 
-`SummarizationScore` metric gives a measure of how well the summary (`response`) captures the important information from the `retrieved_contexts`. The intuition behind this metric is that a good summary shall contain all the important information present in the context(or text so to say).
+The **Summarization Score** metric measures how well a summary (`response`) captures the important information from the `reference_contexts`. The intuition behind this metric is that a good summary should contain all the important information present in the context.
 
 We first extract a set of important keyphrases from the context. These keyphrases are then used to generate a set of questions. The answers to these questions are always `yes(1)` for the context. We then ask these questions to the summary and calculate the summarization score as the ratio of correctly answered questions to the total number of questions. 
 
@@ -27,7 +27,55 @@ $$
 \text{conciseness score}*\text{coeff}
 $$
 
-## Example
+### Example
+
+```python
+from openai import AsyncOpenAI
+from ragas.llms import llm_factory
+from ragas.metrics.collections import SummaryScore
+
+# Setup LLM
+client = AsyncOpenAI()
+llm = llm_factory("gpt-4o-mini", client=client)
+
+# Create metric
+scorer = SummaryScore(llm=llm)
+
+# Evaluate
+result = await scorer.ascore(
+    reference_contexts=[
+        "A company is launching a new product, a smartphone app designed to help users track their fitness goals. The app allows users to set daily exercise targets, log their meals, and track their water intake. It also provides personalized workout recommendations and sends motivational reminders throughout the day."
+    ],
+    response="A company is launching a fitness tracking app that helps users set exercise goals, log meals, and track water intake, with personalized workout suggestions and motivational reminders."
+)
+print(f"Summary Score: {result.value}")
+```
+
+Output:
+
+```
+Summary Score: 0.6423387096775146
+```
+
+!!! note "Synchronous Usage"
+    If you prefer synchronous code, you can use the `.score()` method instead of `.ascore()`:
+    
+    ```python
+    result = scorer.score(
+        reference_contexts=[...],
+        response="..."
+    )
+    ```
+
+
+## Legacy Metrics API
+
+The following examples use the legacy metrics API pattern. For new projects, we recommend using the collections-based API shown above.
+
+!!! warning "Deprecation Timeline"
+    This API will be deprecated in version 0.4 and removed in version 1.0. Please migrate to the collections-based API shown above.
+
+### Example with SingleTurnSample
 
 ```python
 from ragas.dataset_schema import SingleTurnSample
@@ -44,7 +92,9 @@ sample = SingleTurnSample(
 scorer = SummarizationScore(llm=evaluator_llm)
 await scorer.single_turn_ascore(sample)
 ```
-Output
+
+Output:
+
 ```
 0.6423387096775146
 ```