-
Notifications
You must be signed in to change notification settings - Fork 110
Description
I am encountering an issue during RL training.
I have successfully run SFT training using the following command, which works as expected:
bash tools/dist.sh train projects/vrt_sa2va/configs_sa2va/vrt_sa2va_4b_qwen3_sft.py 8
However, when switching to RL training with:
bash tools/dist.sh train projects/vrt_sa2va/configs_sa2va/vrt_sa2va_4b_qwen3_rl.py 8
the training process appears to be missing the following file:
work_dirs/group_inference_sa2va_4b_0801_grpo_ver_1k_0801_80k_output/top_2k_high_variance_samples.json
I would like to ask:
Where is this top_2k_high_variance_samples.json file supposed to be generated?
Is it produced by a specific preprocessing or inference step prior to RL training?
If I remove or do not use keys_json_file, the RL training can still start, but both the reward and loss remain zero throughout training.
Is this behavior related to the absence of keys_json_file?
Does RL training rely on this file to compute rewards or select valid samples?
Thank you!