Hi Authors,
Thanks for your great work! I really enjoy reading your paper and its pretty inspiring. Right now, I have conducted some experiments on the reasoning tasks mentioned in your paper and the results are pretty good!
I am wondering is it possible to extend LatentMAS to visual reasoning tasks as well. For example, can you directly apply a vision language model e.g., Qwen3-VL-8B as the backbone model for image perception and understanding tasks like VLM2-Bench (https://vlm2-bench.github.io/) … that requires model reasoning and collaboration as well?
Hi Authors,
Thanks for your great work! I really enjoy reading your paper and its pretty inspiring. Right now, I have conducted some experiments on the reasoning tasks mentioned in your paper and the results are pretty good!
I am wondering is it possible to extend LatentMAS to visual reasoning tasks as well. For example, can you directly apply a vision language model e.g., Qwen3-VL-8B as the backbone model for image perception and understanding tasks like VLM2-Bench (https://vlm2-bench.github.io/) … that requires model reasoning and collaboration as well?