v0.5.7: Gemma3 Support, XPU Tuning Enhancements, GRPO Improvements, and API Compatibility Fixes

shivam15s released this 12 Apr 00:49

· 248 commits to main since this release

cdd8e74

What's Changed

Gemma3 (Text and Multimodal) by @eljandoubi in #621
Make FLCE compatible with latest XXXForCausalLM.forward() APIs by @Tcc0403 in #596
do bias addition in tests in float32 to make testing code similar to torch compile by @shivam15s in #655
[CI] fix siglip dummy config by @yundai424 in #658
add XPU tuning to JSD by @rmukhopa in #649
add XPU tuning to Rmsnorm and Layernorm by @Tarakarevu1 in #653
Fix imports without transformers by @vaibhavjindal in #659
Use TYPE_CHECKING to fix static-only imports in IDEs etc by @vaibhavjindal in #660
[kl_div] Modified block and warp sizes for improved performance by @jgtong in #654
[GRPO] add support for different loss types by @kashif in #662
Remove unexpected kwargs passing to flce by @Tcc0403 in #651
reduce number of tests for grpo by @shivam15s in #663
Update pyproject.toml by @shivam15s in #665

New Contributors

@rmukhopa made their first contribution in #649
@Tarakarevu1 made their first contribution in #653
@jgtong made their first contribution in #654

Full Changelog: v0.5.6...v0.5.7

Contributors

kashif, vaibhavjindal, and 6 other contributors

Assets 2