CoT-style data for SFT and GRPO

Hello~ I noticed you constructed 90K CoT-style data samples for both SFT and GRPO training. Could you clarify:

1. How many samples were used in each phase respectively?

2. Is the CoT annotation unnecessary during the GRPO stage?"