Benchmarking system extension (evaluation measures): evaluation of non-comparable, "original" CQs

Assessment of which prompting technique/MoE/single LLMs could be best to operationalize this evaluation.
Consideration of Inter-Annotator-Agreement.
Agreement put forward via prompt following the advocacy-inquiry model.