I can't locate the core logic for cleaning the model's generated solution from non-python code in the codes within evaluation directory.
Currently, i can only see how dataset ground truth is parsed, how input prompt is separated from solution, and how cleaned prediction(final answer) is compared with parsed ground truth.