Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Jan 16, 2026

✨ Description

Continuation of #425.

  • Merge cross-entropy, KL and reverse KL losses into a single "entropy loss" interface supporting a consistent set of features (ex. support logits_scale_factor, target_format, implemenatation in KL and reverse KL, add errors for unsupported cases). Add (reference) torch implementation and rework fused implementations. Also merge the associated tests.
  • Fix auxiliary Z loss, add support for loss masking in Z loss.
  • Simplify the LM loss config interface. Integrate get_targets directly into get_loss
  • Combine the distillation loss configs, use entropy_loss_type parameters instead. Add same parameter to standard LM loss.
  • Fix and improve support for cross-entropy splits

@jlamypoirier jlamypoirier changed the base branch from main to jlp_cpu January 16, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants