I noticed that the DatasetEvaluation class includes a variable evaluated_samples
class DatasetEvaluation(BaseModel):
evaluated_samples: int = -1
rejected_samples: Dict[EvaluationRejectionType, int] = {}
However, it seems the current evaluator classes only use this parameter to process the entire dataset (test split) in the benchmark. I’m wondering if we could allow an arbitrary value to be passed during the evaluation dataset construction phase. This could help speed up the evaluation process for benchmark like OmniDocBench, which currently takes about an hour to complete on my machine.