Skip to content

Commit 9e6e826

Browse files
Merge pull request #4276 from voutilad/evaluations
Convert U+2022 bullets to markdown format.
2 parents 4c3181c + 97f99a6 commit 9e6e826

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

articles/ai-services/openai/how-to/evaluations.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -305,12 +305,14 @@ BLEU (BiLingual Evaluation Understudy) score is commonly used in natural languag
305305

306306
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used to evaluate automatic summarization and machine translation. It measures the overlap between generated text and reference summaries. ROUGE focuses on recall-oriented measures to assess how well the generated text covers the reference text.
307307
The ROUGE score provides various metrics, including:
308-
• ROUGE-1: Overlap of unigrams (single words) between generated and reference text.
309-
• ROUGE-2: Overlap of bigrams (two consecutive words) between generated and reference text.
310-
• ROUGE-3: Overlap of trigrams (three consecutive words) between generated and reference text.
311-
• ROUGE-4: Overlap of four-grams (four consecutive words) between generated and reference text.
312-
• ROUGE-5: Overlap of five-grams (five consecutive words) between generated and reference text.
313-
• ROUGE-L: Overlap of L-grams (L consecutive words) between generated and reference text.
308+
309+
- ROUGE-1: Overlap of unigrams (single words) between generated and reference text.
310+
- ROUGE-2: Overlap of bigrams (two consecutive words) between generated and reference text.
311+
- ROUGE-3: Overlap of trigrams (three consecutive words) between generated and reference text.
312+
- ROUGE-4: Overlap of four-grams (four consecutive words) between generated and reference text.
313+
- ROUGE-5: Overlap of five-grams (five consecutive words) between generated and reference text.
314+
- ROUGE-L: Overlap of L-grams (L consecutive words) between generated and reference text.
315+
314316
Text summarization and document comparison are among optimal use cases for ROUGE, particularly in scenarios where text coherence and relevance are critical.
315317

316318
Cosine similarity measures how closely two text embeddings—such as model outputs and reference texts—align in meaning, helping assess the semantic similarity between them. Same as other model-based evaluators, you need to provide a model deployment using for evaluation.

0 commit comments

Comments
 (0)