Thank you very much for this wonderful research. I have some questions about results on Open LLM benchmark (Table 18 in paper).
Do you use zero-shot setting to evaluate on these benchmarks or finetune on train set before evaluating on test set. And can you provide me the prompt you use for evaluation.
Thank you very much.