This example shows the usage of GRPO on the GSM8K dataset, with a task pipeline to prioritize the raw dataset before training.
For more detailed information, please refer to the documentation.
The config files are located in gsm8k.yaml and train_gsm8k.yaml.