Releases · linkedin/Liger-Kernel

21 Nov 22:48

momochen

v0.6.4

0a62700

v0.6.4 release Latest

Latest

Highlights

New model architecture:
Qwen3-VL, hunyuanv1, Olmo3

New algorithm:
DAPO loss

Optimizations:
Layernorm backward, Tiled MLP

What's Changed

Option to return hard and soft loss when using distillation by @h-aurelien-lac in #895
Fix CE patch and add layernorm support for InternVL by @MilkClouds in #921
fix(ci): modify Glm4vMoe config for convergence test by @Tcc0403 in #918
Support for Qwen3-VL models by @mayankagarwals in #911
style: fix main branch format by @Tcc0403 in #929
fix: initialize grad_weight and grad_bias on flce no_grad path by @keatonelvins in #931
Fix qwen3 related tests by @vaibhavjindal in #933
[Cross-entropy-loss] return mean token accuracy metric with CE loss by @kashif in #910
Handle aux_loss for different transformer versions by @vaibhavjindal in #934
Add TiledMLP Implementation by @upskyy in #935
[Qwen3]: If qwen3 is used along with peft config, peft adds opcl obj no… by @yeshsurya in #926
Increase time limit for modal tests by @vaibhavjindal in #947
add hunyuanv1 dense and moe model by @Kingsleyandher in #940
Olmo3 model support [ready for review] by @tyler-romero in #946
[GRPO] add support for dapo loss by @kashif in #939
[Perf] Optimize LayerNorm Backward: Replace Atomics with Persistent Reduction by @niyunsheng in #945

New Contributors

@h-aurelien-lac made their first contribution in #895
@mayankagarwals made their first contribution in #911
@keatonelvins made their first contribution in #931
@upskyy made their first contribution in #935
@yeshsurya made their first contribution in #926
@Kingsleyandher made their first contribution in #940
@niyunsheng made their first contribution in #945

Full Changelog: v0.6.3...v0.6.4

Contributors

kashif, yeshsurya, and 10 other contributors

Assets 2

27 Oct 18:30

momochen

v0.6.3

d5648bf

v0.6.3 release

Highlights in this release:

New model architecture supports:
SmolVLM2, GLM4.5V, InternVL3, Falcon-H1, Qwen-Next

New algorithm:
GSPO

What's Changed

[cross-entropy-loss] Added support for DFT flag by @kashif in #860
fix(test): update assertions in GLM4 instance patching tests by @vvvdwbvvv in #859
Fix nan loss error for LigerFusedLinearJSDLoss by @ParagEkbote in #862
[Cross-entropy] get valid predicted probabilities by @kashif in #864
Enhance Docs by @ParagEkbote in #867
Add Classifiers for Liger-Kernel by @ParagEkbote in #869
docs(mta): suppress invalid sequence syntax warning by @Tcc0403 in #870
Add GSPO by @BjarniHaukur in #845
Add GLM4.5V support by @vvvdwbvvv in #863
A Fix for Issue #872 by @yshenaw in #879
Add pytest coverage for liger-kernel by @ParagEkbote in #876
Replace all torch_dtype with dtype by @Tcc0403 in #881
Update Dev Dependencies by @ParagEkbote in #886
Fixed AMD CI issue #793 by @DevManpreet5 in #887
fix(layernorm): remove n_cols upcasting for for torch.compile by @Tcc0403 in #884
Fix tests and CI by @vaibhavjindal in #882
Remove daily test cron job by @vaibhavjindal in #890
[UT] [XPU] Modify the test cases of XPU for triton3.5 by @YangKai0616 in #889
Add InternVL3 support by @MilkClouds in #878
fix(flce): add shift_labels as eval mode loss condition by @Tcc0403 in #888
Add support of Falcon-H1 models for liger kernels by @puneeshkhanna in #900
Don't deploy mkdocs to fix benchmarking by @vaibhavjindal in #904
Disable mllama multimodal test in transformers<4.51.0 by @Tcc0403 in #899
Add flce forward for FalconH1ForCausalLM and missing tests by @Tcc0403 in #903
feat(ce,flce): decouple gradients computation for no_grad mode by @Tcc0403 in #894
fix(llama4): Get correct swiglu patch target for llama4 moe layer by @alenawang in #907
Add PolyNorm operator by @0xtoward in #901
Copy and paste benchmarks before and after gh-pages deployment by @vaibhavjindal in #909
Filter out redundant ops/allocations in no_grad mode by @Tcc0403 in #906
Add support for Qwen3Next model with Liger kernels by @vvvdwbvvv in #912
refactor(convergence-test): remove unnecessary print by @Tcc0403 in #913
Enabled the tests glm4v/glm4v_moe for XPU and Fixed the monkey patch error by @YangKai0616 in #914
[Test][XPU] Added gpu cache cleaning for XPU devices by @Egor-Krivov in #917
Add SmolVLM2 support by @MilkClouds in #919

New Contributors

@BjarniHaukur made their first contribution in #845
@yshenaw made their first contribution in #879
@DevManpreet5 made their first contribution in #887
@MilkClouds made their first contribution in #878
@puneeshkhanna made their first contribution in #900
@alenawang made their first contribution in #907
@0xtoward made their first contribution in #901

Full Changelog: v0.6.2...v0.6.3

Contributors

kashif, Egor-Krivov, and 12 other contributors

Assets 2

22 Aug 00:15

vaibhavjindal

v0.6.2

77a4c1a

v0.6.2

What's Changed

Automate Benchmarking - fixing issue. by @Manan17 in #836
Make path variable global by @Manan17 in #840
Adding support for apo losses, sppo_hard and nca_pair by @Manan17 in #841
Add accum_dtype option for FusedLinearCrossEntropy by @Tcc0403 in #830
CI tests fix by @Manan17 in #847
docs(README): fix intel ci link by @Tcc0403 in #842
Llama4 rope implementation by @Manan17 in #843
fix(phi3): update monkey patch for Phi3ForCausalLM by @Tcc0403 in #837
feat(FLCE): expose accum_dtype for hf model monkey patch by @Tcc0403 in #851
Fix ci by @Manan17 in #853
Fix missing low-level api imports by @Kirill-Kravtsov in #856
Add glm4.1v model support by @vvvdwbvvv in #858
Update pyproject.toml version to 0.6.2 by @vaibhavjindal in #861

New Contributors

@Kirill-Kravtsov made their first contribution in #856

Full Changelog: v0.6.1...v0.6.2

Contributors

Kirill-Kravtsov, vaibhavjindal, and 3 other contributors

Assets 2

28 Jul 18:36

shimizust

v0.6.1

7705dcc

v0.6.1

What's Changed

Fix gemma3 forward with skip_logits by @BitPhinix in #795
Update README.md by @PKUWZP in #808
Fix minor typo by @hugoabonizio in #809
Update README.md by @PKUWZP in #811
Fix embedding benchmarks for backward pass by @Manan17 in #799
Giving an option to update benchmark results for previous commits. by @Manan17 in #791
[Model] Liger support for SmolLM3 by @edbeeching in #798
FusedAddRMSNorm: Fused residual addition and RMS Norm by @vaibhavjindal in #812
Skip smollm3 tests in tests-bwd by @vaibhavjindal in #821
Layernorm enhancement by @Manan17 in #815
Update README.md by @PKUWZP in #823
Update index.md by @PKUWZP in #824
Remove smollm3 import at top of file by @vaibhavjindal in #825
Fix illegal memory access in Triton RMSNorm kernel by casting program_id to int64 by @vvvdwbvvv in #804
fix(benchmark): move chunked loss module init out of measurements by @Tcc0403 in #643
[XPU]Fixed the issue with multiple num_warps parameters being passed in. by @YangKai0616 in #831
Automate benchmarking - for every release by @Manan17 in #828
Revert "Bug Fix: name patching for modules" by @vaibhavjindal in #833
Bug fixes in patching module by @vaibhavjindal in #834
docs(README): fix gpumode discord badge by @Tcc0403 in #835
Update pyproject.toml version to 0.6.1 by @shimizust in #838

New Contributors

@BitPhinix made their first contribution in #795
@PKUWZP made their first contribution in #808
@hugoabonizio made their first contribution in #809
@edbeeching made their first contribution in #798

Full Changelog: v0.6.0...v0.6.1

Contributors

hugoabonizio, edbeeching, and 8 other contributors

Assets 2

09 Jul 05:05

shimizust

v0.6.0

66570b1

v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates

Highlights

This release introduces significant improvements to Liger-Kernel, including new operators, support for Llama 4 models, more robust benchmarking automation, and key fixes for patching of vision-language models (VLMs) due to recent transformers refactoring.

Key Changes

New Features & Improvements

Multi-Token Attention by @AndreSlavescu (#689)
Fused Neighborhood Attention by @AndreSlavescu (#732)
Cosine Similarity Loss for Distillation by @Dexterai (#780)
Support for Llama 4 by @Manan17 (#740)
Option to choose fused LCE/CE loss by @connermanuel (#704)
Add block_rms_norm for QK norm by @mdy666 (#731)

Bug Fixes

Vision-language model patching in recent transformers versions (>=4.52.0):
- Qwen2vl, Qwen2_5_vl by @Tcc0403 (#728, #738)
- Llava by @Tcc0403, @Manan17 (#714, #743, #751)
- Gemma3 by @shimizust, @Tcc0403 (#735, #787, #790);
RMS Norm patching by @vaibhavjindal, @BenasdTW (#741, #765)
Hugging Face forward kwargs fix by @llllvvuu (#708)
Fix import tanh by @jue-jue-zi (#762)
Apply monkey patch to instances by @YangKai0616 (#772)

Documentation & CI Fixes

Deploy MkDocs to GitHub Pages by @ParagEkbote (#724)
Robust doc updates by @ParagEkbote (#726, #727)
.idea ignored by @Tcc0403 (#784)
ReadMe, MTA + softmax docs by @AndreSlavescu (#730)
Relax DyT tol, XPU skip MTA by @Tcc0403 (#778)
Paligemma test fixes by @vvvdwbvvv (#785)
Style & test fixes by @Tcc0403, @vaibhavjindal (#736, #794)
Add torchvision for multimodal test by @Tcc0403 (#755)

Benchmarking & Automation

Automated benchmarking and visualization UI in GitHub pages by @Manan17 (#744, #747, #749, #752, #753, #756, #759, #760, #770, #779)

New Contributors

@connermanuel made their first contribution in #704
@llllvvuu made their first contribution in #708
@jue-jue-zi made their first contribution in #762
@YangKai0616 made their first contribution in #772
@Dexterai made their first contribution in #780
@vvvdwbvvv made their first contribution in #785

Full Changelog: v0.5.10...v0.6.0

Contributors

llllvvuu, jue-jue-zi, and 12 other contributors

Assets 2

22 May 17:52

shimizust

v0.5.10

44a8f2f

v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes

What's Changed

fix zip bug by @KareemMusleh in #702
[dpo] set default average_log_prob to False by @cyr0930 in #693
Rank build status lower by @momochen in #707
Add support for Qwen3 MoE models by @chiwanpark in #706
Fix qwen3_moe flaky convergence test by @vaibhavjindal in #710
Fix empty Medusa head tensors by @chiwanpark in #698
Sparsemax by @AndreSlavescu in #687
fix: remove docstring imports in transformer patches by @NanoCode012 in #712
Increase tests timeout to 45 mins by @vaibhavjindal in #718
fix modal tests by @shivam15s in #719
Visualizer Update by @AndreSlavescu in #717
Sparsemax Documentation by @AndreSlavescu in #716
element-wise-DyT faster than the origin LigerDyT by @mdy666 in #673
GRPO Loss kernel fully write by triton, reduce 46G memory by @mdy666 in #672
Make FLCE compatible with FSDP and PEFT by @astefanutti in #674
Fix incorrect module patching when using LoRA with modules_to_save by @BenasdTW in #632
[XPU] Changed how XPU discovery works during setup.py by @Egor-Krivov in #720
Fix to publish docs on pushes to main branch by @shimizust in #722
Release 0.5.10 by @shimizust in #725

New Contributors

@KareemMusleh made their first contribution in #702
@cyr0930 made their first contribution in #693
@NanoCode012 made their first contribution in #712
@mdy666 made their first contribution in #673
@astefanutti made their first contribution in #674
@Egor-Krivov made their first contribution in #720

Full Changelog: v0.5.9...v0.5.10

Contributors

astefanutti, momochen, and 11 other contributors

Assets 2

04 May 19:47

shivam15s

v0.5.9

f19068f

v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes

What's Changed

update setup.py for installation on xpu by @faaany in #668
update XPU CI yaml file to use docker container by @faaany in #669
Add average_log_prob as an init param for LigerFusedLinearDPOLoss by @vaibhavjindal in #676
add shift label change by @shivam15s in #683
remove tests that can pass on XPU by @faaany in #686
Update mkdocs.yml by @shivam15s in #691
Fix LigerCrossEntropy reduction='none' by @Tcc0403 in #680
Support GLM-4 models by @intervitens in #685
Import glm4_lce_forward locally in function by @vaibhavjindal in #695
Qwen3 model support by @vaibhavjindal in #692
Use logits_to_keep logic for training runs by @vaibhavjindal in #696
increase gemma3 multimodal convergence test loss atol by @shivam15s in #697
Update pyproject.toml by @shivam15s in #700

New Contributors

@intervitens made their first contribution in #685

Full Changelog: v0.5.8...v0.5.9

Contributors

faaany, vaibhavjindal, and 3 other contributors

Assets 2

12 Apr 16:44

shivam15s

v0.5.8

43d0ac1

v0.5.8: Backward-Compatible Fix

What's Changed

backward compatible initialization by @shivam15s in #666
Update pyproject.toml by @shivam15s in #667

Full Changelog: v0.5.7...v0.5.8

Contributors

shivam15s

Assets 2

12 Apr 00:49

shivam15s

v0.5.7

cdd8e74

v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes

What's Changed

Gemma3 (Text and Multimodal) by @eljandoubi in #621
Make FLCE compatible with latest XXXForCausalLM.forward() APIs by @Tcc0403 in #596
do bias addition in tests in float32 to make testing code similar to torch compile by @shivam15s in #655
[CI] fix siglip dummy config by @yundai424 in #658
add XPU tuning to JSD by @rmukhopa in #649
add XPU tuning to Rmsnorm and Layernorm by @Tarakarevu1 in #653
Fix imports without transformers by @vaibhavjindal in #659
Use TYPE_CHECKING to fix static-only imports in IDEs etc by @vaibhavjindal in #660
[kl_div] Modified block and warp sizes for improved performance by @jgtong in #654
[GRPO] add support for different loss types by @kashif in #662
Remove unexpected kwargs passing to flce by @Tcc0403 in #651
reduce number of tests for grpo by @shivam15s in #663
Update pyproject.toml by @shivam15s in #665

New Contributors

@rmukhopa made their first contribution in #649
@Tarakarevu1 made their first contribution in #653
@jgtong made their first contribution in #654

Full Changelog: v0.5.6...v0.5.7

Contributors

kashif, vaibhavjindal, and 6 other contributors

Assets 2

02 Apr 21:55

shivam15s

v0.5.6

c3c2d4f

v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)

What's Changed

[JSD] JSD fixes by @kashif in #609
Paligemma support by @eljandoubi in #608
Fix hidden size by @eljandoubi in #612
Add loss_utils for rewriting lce_forward methods by @Tcc0403 in #614
Update Star History URL by @ryankert01 in #616
Update README.md by @shivam15s in #617
language model of paligemma 1 is gemma 1. by @eljandoubi in #613
Update README to reflect recent changes by @helloworld1 in #619
Support Dynamic Tanh (DyT) by @Tcc0403 in #618
Fix incorrect module name when monkey_patch applied to instantiated model by @vaibhavjindal in #629
[chunked loss] align teacher and student logit shape by @yundai424 in #634
Fix incorrect condition comment in log_target calculation by @p81sunshine in #633
Add huggingface llava by @jp1924 in #524
fix Llava test-bwd failure by @jp1924 in #639
Fix GRPO to conform with TRL: Fix loss, make tests accurate, correct metrics computation by @shivam15s and @mRSun15 in #628
add xpu tuning to CE by @mgrabban in #645
add xpu tuning to FLJSD by @mgrabban in #647
Change tests to use rocm 6.3 version and tol changes to make liger run on amd by @shivam15s in #646
Update pyproject.toml by @shivam15s in #648

New Contributors

@eljandoubi made their first contribution in #608
@p81sunshine made their first contribution in #633

Full Changelog: v0.5.5...v0.5.6

Contributors

kashif, helloworld1, and 10 other contributors

Assets 2

Releases: linkedin/Liger-Kernel

v0.6.4 release

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.3 release

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.0: New Attention Operators, Cosine Similarity Loss, Llama 4, and VLM Patching Updates

Highlights

Key Changes

New Features & Improvements

Bug Fixes

Documentation & CI Fixes

Benchmarking & Automation

New Contributors

Contributors

Uh oh!

v0.5.10: Qwen3 MOE support, Sparsemax kernel, bug fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.9: Adds XPU Setup, GLM-4 & Qwen3 Model Support, Key Bugfixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.8: Backward-Compatible Fix

What's Changed

Contributors

Uh oh!

v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.6: Enhancements, Fixes, and Expanded Support (Paligemma, DyT, XPU, Llava, GRPO, and More!)

What's Changed

New Contributors

Contributors

Uh oh!