Entropy Dynamics in Qwen Models: Amplification of High-Entropy Tokens with Scaling

Nice work! Other studies have shown that the entropy trajectories of Qwen-8B, 14B, and 32B exhibit a decreasing-then-increasing trend [1]. Furthermore, they point out that the increase in entropy is primarily concentrated on high-entropy tokens in the base model. In other words, tokens that are already high in entropy tend to be generated with even higher entropy as model scale increases.  I’d appreciate the opportunity to discuss this further to better understand the underlying dynamics.

[1] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entropy Dynamics in Qwen Models: Amplification of High-Entropy Tokens with Scaling #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Entropy Dynamics in Qwen Models: Amplification of High-Entropy Tokens with Scaling #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions