Releases: linkedin/Liger-Kernel
v0.6.4 release
Highlights
New model architecture:
Qwen3-VL, hunyuanv1, Olmo3
New algorithm:
DAPO loss
Optimizations:
Layernorm backward, Tiled MLP
What's Changed
- Option to return hard and soft loss when using distillation by @h-aurelien-lac in #895
- Fix CE patch and add layernorm support for InternVL by @MilkClouds in #921
- fix(ci): modify Glm4vMoe config for convergence test by @Tcc0403 in #918
- Support for Qwen3-VL models by @mayankagarwals in #911
- style: fix main branch format by @Tcc0403 in #929
- fix: initialize grad_weight and grad_bias on flce no_grad path by @keatonelvins in #931
- Fix qwen3 related tests by @vaibhavjindal in #933
- [Cross-entropy-loss] return mean token accuracy metric with CE loss by @kashif in #910
- Handle aux_loss for different transformer versions by @vaibhavjindal in #934
- Add TiledMLP Implementation by @upskyy in #935
- [Qwen3]: If qwen3 is used along with peft config, peft adds opcl obj no… by @yeshsurya in #926
- Increase time limit for modal tests by @vaibhavjindal in #947
- add hunyuanv1 dense and moe model by @Kingsleyandher in #940
- Olmo3 model support [ready for review] by @tyler-romero in #946
- [GRPO] add support for dapo loss by @kashif in #939
- [Perf] Optimize LayerNorm Backward: Replace Atomics with Persistent Reduction by @niyunsheng in #945
New Contributors
- @h-aurelien-lac made their first contribution in #895
- @mayankagarwals made their first contribution in #911
- @keatonelvins made their first contribution in #931
- @upskyy made their first contribution in #935
- @yeshsurya made their first contribution in #926
- @Kingsleyandher made their first contribution in #940
- @niyunsheng made their first contribution in #945
Full Changelog: v0.6.3...v0.6.4
v0.6.3 release
Highlights in this release:
New model architecture supports:
SmolVLM2, GLM4.5V, InternVL3, Falcon-H1, Qwen-Next
New algorithm:
GSPO
What's Changed
- [cross-entropy-loss] Added support for DFT flag by @kashif in #860
- fix(test): update assertions in GLM4 instance patching tests by @vvvdwbvvv in #859
- Fix nan loss error for LigerFusedLinearJSDLoss by @ParagEkbote in #862
- [Cross-entropy] get valid predicted probabilities by @kashif in #864
- Enhance Docs by @ParagEkbote in #867
- Add Classifiers for Liger-Kernel by @ParagEkbote in #869
- docs(mta): suppress invalid sequence syntax warning by @Tcc0403 in #870
- Add GSPO by @BjarniHaukur in #845
- Add GLM4.5V support by @vvvdwbvvv in #863
- A Fix for Issue #872 by @yshenaw in #879
- Add pytest coverage for liger-kernel by @ParagEkbote in #876
- Replace all torch_dtype with dtype by @Tcc0403 in #881
- Update Dev Dependencies by @ParagEkbote in #886
- Fixed AMD CI issue #793 by @DevManpreet5 in #887
- fix(layernorm): remove
n_colsupcasting for for torch.compile by @Tcc0403 in #884 - Fix tests and CI by @vaibhavjindal in #882
- Remove daily test cron job by @vaibhavjindal in #890
- [UT] [XPU] Modify the test cases of XPU for triton3.5 by @YangKai0616 in #889
- Add InternVL3 support by @MilkClouds in #878
- fix(flce): add
shift_labelsas eval mode loss condition by @Tcc0403 in #888 - Add support of Falcon-H1 models for liger kernels by @puneeshkhanna in #900
- Don't deploy mkdocs to fix benchmarking by @vaibhavjindal in #904
- Disable mllama multimodal test in transformers<4.51.0 by @Tcc0403 in #899
- Add flce forward for FalconH1ForCausalLM and missing tests by @Tcc0403 in #903
- feat(ce,flce): decouple gradients computation for no_grad mode by @Tcc0403 in #894
- fix(llama4): Get correct swiglu patch target for llama4 moe layer by @alenawang in #907
- Add PolyNorm operator by @0xtoward in #901
- Copy and paste benchmarks before and after gh-pages deployment by @vaibhavjindal in #909
- Filter out redundant ops/allocations in no_grad mode by @Tcc0403 in #906
- Add support for Qwen3Next model with Liger kernels by @vvvdwbvvv in #912
- refactor(convergence-test): remove unnecessary print by @Tcc0403 in #913
- Enabled the tests glm4v/glm4v_moe for XPU and Fixed the monkey patch error by @YangKai0616 in #914
- [Test][XPU] Added gpu cache cleaning for XPU devices by @Egor-Krivov in #917
- Add SmolVLM2 support by @MilkClouds in #919
New Contributors
- @BjarniHaukur made their first contribution in #845
- @yshenaw made their first contribution in #879
- @DevManpreet5 made their first contribution in #887
- @MilkClouds made their first contribution in #878
- @puneeshkhanna made their first contribution in #900
- @alenawang made their first contribution in #907
- @0xtoward made their first contribution in #901
Full Changelog: v0.6.2...v0.6.3
v0.6.2
What's Changed
- Automate Benchmarking - fixing issue. by @Manan17 in #836
- Make path variable global by @Manan17 in #840
- Adding support for apo losses, sppo_hard and nca_pair by @Manan17 in #841
- Add
accum_dtypeoption forFusedLinearCrossEntropyby @Tcc0403 in #830 - CI tests fix by @Manan17 in #847
- docs(README): fix intel ci link by @Tcc0403 in #842
- Llama4 rope implementation by @Manan17 in #843
- fix(phi3): update monkey patch for
Phi3ForCausalLMby @Tcc0403 in #837 - feat(FLCE): expose
accum_dtypefor hf model monkey patch by @Tcc0403 in #851 - Fix ci by @Manan17 in #853
- Fix missing low-level api imports by @Kirill-Kravtsov in #856
- Add glm4.1v model support by @vvvdwbvvv in #858
- Update pyproject.toml version to 0.6.2 by @vaibhavjindal in #861
New Contributors
- @Kirill-Kravtsov made their first contribution in #856
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's Changed
- Fix gemma3 forward with skip_logits by @BitPhinix in #795
- Update README.md by @PKUWZP in #808
- Fix minor typo by @hugoabonizio in #809
- Update README.md by @PKUWZP in #811
- Fix embedding benchmarks for backward pass by @Manan17 in #799
- Giving an option to update benchmark results for previous commits. by @Manan17 in #791
- [Model] Liger support for SmolLM3 by @edbeeching in #798
- FusedAddRMSNorm: Fused residual addition and RMS Norm by @vaibhavjindal in #812
- Skip smollm3 tests in tests-bwd by @vaibhavjindal in #821
- Layernorm enhancement by @Manan17 in #815
- Update README.md by @PKUWZP in #823
- Update index.md by @PKUWZP in #824
- Remove smollm3 import at top of file by @vaibhavjindal in #825
- Fix illegal memory access in Triton RMSNorm kernel by casting program_id to int64 by @vvvdwbvvv in #804
- fix(benchmark): move chunked loss module init out of measurements by @Tcc0403 in #643
- [XPU]Fixed the issue with multiple num_warps parameters being passed in. by @YangKai0616 in #831
- Automate benchmarking - for every release by @Manan17 in #828
- Revert "Bug Fix: name patching for modules" by @vaibhavjindal in #833
- Bug fixes in patching module by @vaibhavjindal in #834
- docs(README): fix gpumode discord badge by @Tcc0403 in #835
- Update pyproject.toml version to 0.6.1 by @shimizust in #838
New Contributors
- @BitPhinix made their first contribution in #795
- @PKUWZP made their first contribution in #808
- @hugoabonizio made their first contribution in #809
- @edbeeching made their first contribution in #798
Full Changelog: v0.6.0...v0.6.1
v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates
Highlights
This release introduces significant improvements to Liger-Kernel, including new operators, support for Llama 4 models, more robust benchmarking automation, and key fixes for patching of vision-language models (VLMs) due to recent transformers refactoring.
Key Changes
New Features & Improvements
- Multi-Token Attention by @AndreSlavescu (#689)
- Fused Neighborhood Attention by @AndreSlavescu (#732)
- Cosine Similarity Loss for Distillation by @Dexterai (#780)
- Support for Llama 4 by @Manan17 (#740)
- Option to choose fused LCE/CE loss by @connermanuel (#704)
- Add block_rms_norm for QK norm by @mdy666 (#731)
Bug Fixes
- Vision-language model patching in recent transformers versions (>=4.52.0):
- RMS Norm patching by @vaibhavjindal, @BenasdTW (#741, #765)
- Hugging Face forward kwargs fix by @llllvvuu (#708)
- Fix import tanh by @jue-jue-zi (#762)
- Apply monkey patch to instances by @YangKai0616 (#772)
Documentation & CI Fixes
- Deploy MkDocs to GitHub Pages by @ParagEkbote (#724)
- Robust doc updates by @ParagEkbote (#726, #727)
- .idea ignored by @Tcc0403 (#784)
- ReadMe, MTA + softmax docs by @AndreSlavescu (#730)
- Relax DyT tol, XPU skip MTA by @Tcc0403 (#778)
- Paligemma test fixes by @vvvdwbvvv (#785)
- Style & test fixes by @Tcc0403, @vaibhavjindal (#736, #794)
- Add torchvision for multimodal test by @Tcc0403 (#755)
Benchmarking & Automation
- Automated benchmarking and visualization UI in GitHub pages by @Manan17 (#744, #747, #749, #752, #753, #756, #759, #760, #770, #779)
New Contributors
- @connermanuel made their first contribution in #704
- @llllvvuu made their first contribution in #708
- @jue-jue-zi made their first contribution in #762
- @YangKai0616 made their first contribution in #772
- @Dexterai made their first contribution in #780
- @vvvdwbvvv made their first contribution in #785
Full Changelog: v0.5.10...v0.6.0
v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes
What's Changed
- fix zip bug by @KareemMusleh in #702
- [dpo] set default average_log_prob to False by @cyr0930 in #693
- Rank build status lower by @momochen in #707
- Add support for Qwen3 MoE models by @chiwanpark in #706
- Fix qwen3_moe flaky convergence test by @vaibhavjindal in #710
- Fix empty Medusa head tensors by @chiwanpark in #698
- Sparsemax by @AndreSlavescu in #687
- fix: remove docstring imports in transformer patches by @NanoCode012 in #712
- Increase tests timeout to 45 mins by @vaibhavjindal in #718
- fix modal tests by @shivam15s in #719
- Visualizer Update by @AndreSlavescu in #717
- Sparsemax Documentation by @AndreSlavescu in #716
- element-wise-DyT faster than the origin LigerDyT by @mdy666 in #673
- GRPO Loss kernel fully write by triton, reduce 46G memory by @mdy666 in #672
- Make FLCE compatible with FSDP and PEFT by @astefanutti in #674
- Fix incorrect module patching when using LoRA with modules_to_save by @BenasdTW in #632
- [XPU] Changed how XPU discovery works during
setup.pyby @Egor-Krivov in #720 - Fix to publish docs on pushes to main branch by @shimizust in #722
- Release 0.5.10 by @shimizust in #725
New Contributors
- @KareemMusleh made their first contribution in #702
- @cyr0930 made their first contribution in #693
- @NanoCode012 made their first contribution in #712
- @mdy666 made their first contribution in #673
- @astefanutti made their first contribution in #674
- @Egor-Krivov made their first contribution in #720
Full Changelog: v0.5.9...v0.5.10
v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes
What's Changed
- update setup.py for installation on xpu by @faaany in #668
- update XPU CI yaml file to use docker container by @faaany in #669
- Add average_log_prob as an init param for LigerFusedLinearDPOLoss by @vaibhavjindal in #676
- add shift label change by @shivam15s in #683
- remove tests that can pass on XPU by @faaany in #686
- Update mkdocs.yml by @shivam15s in #691
- Fix LigerCrossEntropy reduction='none' by @Tcc0403 in #680
- Support GLM-4 models by @intervitens in #685
- Import glm4_lce_forward locally in function by @vaibhavjindal in #695
- Qwen3 model support by @vaibhavjindal in #692
- Use logits_to_keep logic for training runs by @vaibhavjindal in #696
- increase gemma3 multimodal convergence test loss atol by @shivam15s in #697
- Update pyproject.toml by @shivam15s in #700
New Contributors
- @intervitens made their first contribution in #685
Full Changelog: v0.5.8...v0.5.9
v0.5.8: Backward-Compatible Fix
What's Changed
- backward compatible initialization by @shivam15s in #666
- Update pyproject.toml by @shivam15s in #667
Full Changelog: v0.5.7...v0.5.8
v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes
What's Changed
- Gemma3 (Text and Multimodal) by @eljandoubi in #621
- Make FLCE compatible with latest
XXXForCausalLM.forward()APIs by @Tcc0403 in #596 - do bias addition in tests in float32 to make testing code similar to torch compile by @shivam15s in #655
- [CI] fix siglip dummy config by @yundai424 in #658
- add XPU tuning to JSD by @rmukhopa in #649
- add XPU tuning to Rmsnorm and Layernorm by @Tarakarevu1 in #653
- Fix imports without transformers by @vaibhavjindal in #659
- Use TYPE_CHECKING to fix static-only imports in IDEs etc by @vaibhavjindal in #660
- [kl_div] Modified block and warp sizes for improved performance by @jgtong in #654
- [GRPO] add support for different loss types by @kashif in #662
- Remove unexpected kwargs passing to flce by @Tcc0403 in #651
- reduce number of tests for grpo by @shivam15s in #663
- Update pyproject.toml by @shivam15s in #665
New Contributors
- @rmukhopa made their first contribution in #649
- @Tarakarevu1 made their first contribution in #653
- @jgtong made their first contribution in #654
Full Changelog: v0.5.6...v0.5.7
v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)
What's Changed
- [JSD] JSD fixes by @kashif in #609
- Paligemma support by @eljandoubi in #608
- Fix hidden size by @eljandoubi in #612
- Add loss_utils for rewriting lce_forward methods by @Tcc0403 in #614
- Update Star History URL by @ryankert01 in #616
- Update README.md by @shivam15s in #617
- language model of paligemma 1 is gemma 1. by @eljandoubi in #613
- Update README to reflect recent changes by @helloworld1 in #619
- Support Dynamic Tanh (DyT) by @Tcc0403 in #618
- Fix incorrect module name when monkey_patch applied to instantiated model by @vaibhavjindal in #629
- [chunked loss] align teacher and student logit shape by @yundai424 in #634
- Fix incorrect condition comment in log_target calculation by @p81sunshine in #633
- Add huggingface llava by @jp1924 in #524
- fix Llava test-bwd failure by @jp1924 in #639
- Fix GRPO to conform with TRL: Fix loss, make tests accurate, correct metrics computation by @shivam15s and @mRSun15 in #628
- add xpu tuning to CE by @mgrabban in #645
- add xpu tuning to FLJSD by @mgrabban in #647
- Change tests to use rocm 6.3 version and tol changes to make liger run on amd by @shivam15s in #646
- Update pyproject.toml by @shivam15s in #648
New Contributors
- @eljandoubi made their first contribution in #608
- @p81sunshine made their first contribution in #633
Full Changelog: v0.5.5...v0.5.6