★---> train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}
★---> train stats after 160800 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l 2ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4887', 'grad_norm': 'nan', 'counters/examples': 160800, 'counters/updates': 5025}