Nice work! Other studies have shown that the entropy trajectories of Qwen-8B, 14B, and 32B exhibit a decreasing-then-increasing trend [1]. Furthermore, they point out that the increase in entropy is primarily concentrated on high-entropy tokens in the base model. In other words, tokens that are already high in entropy tend to be generated with even higher entropy as model scale increases. I’d appreciate the opportunity to discuss this further to better understand the underlying dynamics.
[1] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning