workaround is to set it to false for gemma models. Filed issue upstream. https://github.com/vllm-project/vllm/issues/31123 once fixed we can remove the skip init force in RL