add logging of example rewards with snipped output, controlled under debug=True flag from GRPOConfig
#1298
+65
−222
debug=True flag from GRPOConfig
#1298