File tree Expand file tree Collapse file tree 2 files changed +9
-1
lines changed
Expand file tree Collapse file tree 2 files changed +9
-1
lines changed Original file line number Diff line number Diff line change 1212
1313## What's New
1414
15+ ## Feb 23, 2026
16+ * Add token distillation training support to distillation task wrappers
17+ * Remove some torch.jit usage in prep for official deprecation
18+ * Caution added to AdamP optimizer
19+ * Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
20+ * Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_ , alternate NS branch for DTensor)
21+ * Release 1.0.25
22+
1523## Jan 21, 2026
1624* ** Compat Break** : Fix oversight w/ QKV vs MLP bias in ` ParallelScalingBlock ` (& ` DiffParallelScalingBlock ` )
1725 * Does not impact any trained ` timm ` models but could impact downstream use.
Original file line number Diff line number Diff line change 1- __version__ = '1.0.25.dev0 '
1+ __version__ = '1.0.25'
You can’t perform that action at this time.
0 commit comments