GRPO on GSM8K dataset with Task Pipeline

This example shows the usage of GRPO on the GSM8K dataset, with a task pipeline to prioritize the raw dataset before training.

For more detailed information, please refer to the documentation.