Thanks for sharing the great work! I followed the step-by-step instructions and use the Rynn Bench dataset with a few simple code change. However it looks like the benchmark numbers in Table 3 from TR paper cannot be reproduced for open-source models such as Qwen3-VL-8B-Instruct and Cosmos-Reason2, especially for RynnBrain-Grounding, Area and Affordance. Is there any important details missing in the instruction for hyper-parameters and/or datasets?