Question about `num_images_per_prompt` Setting in OmniContext Benchmark

Hi, thank you for your excellent work!

I have a question regarding the `num_images_per_prompt` setting during sampling when evaluating on the OmniContext benchmark. It appears that only **one** image is sampled per prompt. Could this setting lead to significant randomness in the results?