The inference time is longer than 7B model.

I used the exact same code to evaluate UI-Tars and InfiGUI 3B, and found that the inference time of InfiGUI is significantly longer than that of UI-Tars 2B, and is roughly the same with UI-Tars 7B. However, since they are both based on the Qwen model architecture, what might explain this unexpected discrepancy?