Ensuring Consistent Question Sets for Summarization Score Evaluation

Hi!

I have a question regarding the generation of question sets for the Summarization Score metric. I am working on creating a high-quality summary and need reliable evaluation metrics to assess it. I have found the Summarization Score metric to be very useful for checking the quality of my summaries.

However, I am experiencing some issues with the volatility of the scores. Even though the input is always the same, the questions generated differ each time. Is there a way to ensure that the set of questions remains consistent when the input is the same?


I have an idea related to this issue. If it is not possible to generate a consistent set of questions using an LLM, what do you think about attempting to score the summary multiple times (n times) and averaging the results to get a more stable score? If you have any other good suggestions, I would greatly appreciate it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ensuring Consistent Question Sets for Summarization Score Evaluation #1119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ensuring Consistent Question Sets for Summarization Score Evaluation #1119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions