Hello~ I noticed you constructed 90K CoT-style data samples for both SFT and GRPO training. Could you clarify: 1. How many samples were used in each phase respectively? 2. Is the CoT annotation unnecessary during the GRPO stage?"