Thanks for releasing this wonderful work. In `evaluate_from_local.py`, the extract_xx functions appear to have typos (i.e., L101, 111, 119). As the MMLU PRO dataset have questions with answer A-P, then the pattern should be something like *A-P* instead of *A-J*