Skip to content

Ensuring Consistent Question Sets for Summarization Score EvaluationΒ #1119

@noweymik

Description

@noweymik

Hi!

I have a question regarding the generation of question sets for the Summarization Score metric. I am working on creating a high-quality summary and need reliable evaluation metrics to assess it. I have found the Summarization Score metric to be very useful for checking the quality of my summaries.

However, I am experiencing some issues with the volatility of the scores. Even though the input is always the same, the questions generated differ each time. Is there a way to ensure that the set of questions remains consistent when the input is the same?

I have an idea related to this issue. If it is not possible to generate a consistent set of questions using an LLM, what do you think about attempting to score the summary multiple times (n times) and averaging the results to get a more stable score? If you have any other good suggestions, I would greatly appreciate it.

Metadata

Metadata

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions