Hello! I noticed that your evaluation on datasets with short prompt is conducted in a simulated manner, which means that you perform autoregressive generation during prefill phase on datasets like MMLU since there is only one step during decoding phase.
Can you show some detailed code example in this repo? Thanks!