**Additional context** Our customer would like to apply RL methods (GRPO, GSPO, and SPO) to VLM with MoE (such as Qwen3-VL). Would it be possible to extend the current VLM support to Qwen3-VL? (cc. @terrykong, @snowmanwwg )