v0.4.0: AWS Neuron SDK 2.6, Trainium 2 support, Qwen3-MoE, Llama4 (text)
What's Changed
Inference
- Add Flux Inpaint support by @JingyaHuang in #909
- chore: Bump diffusers to 0.35.* by @JingyaHuang in #935
- fix: flux neuron cache detection by @JingyaHuang in #937
- Add flux inpaint to supported by @JingyaHuang in #932
- feat: Support flux kontext for text2img by @jlonge4 in #916
- allow safetensors to be downloaded for flux by @Abdennacer-Badaoui in #939
- Add support for SmolLM3 models by @dacorvo in #934
- Add support for Qwen3Moe models by @dacorvo in #945
- Cleanup inference backend modules by @dacorvo in #948
- Remove unsupported modeling flags by @dacorvo in #950
- Remove optimized model dependency in LLM models by @dacorvo in #955
- Add tests for modules used in inference by @tengomucho in #957
- Add support for text generation in Llama4 models by @dacorvo in #959
- test(inference): add tests to check decoder layer accuracy by @tengomucho in #962
- Add vLLM docker image by @dacorvo in #967
- Add trn1 vllm llama benchmark by @dacorvo in #970
- Enable CPU compilation by @Abdennacer-Badaoui in #961
- Enable
instance_typetag to export by @JingyaHuang in #974 - Add trn2 benchmarks for a few big models by @dacorvo in #991
- Improve DX when exporting and deploying LLM neuron models by @dacorvo in #986
- ECR Image URI retrieval by @tengomucho in #985
- Add support for Trainium 2 for decoder models by @dacorvo in #988
- Automatically detect platform when serving models by @dacorvo in #989
- Add trn1 qwen3 and llama4 moe vLLM benchmark by @dacorvo in #973
Training
- Trainers refactor by @michaelbenayoun in #918
- fix: uses processing_class instead of tokenizer in base trainer by @michaelbenayoun in #927
- fix: fixes barrier issue at the end of training with hub sync by @michaelbenayoun in #925
- Sync training custom modeling to transformers=4.55.4 by @michaelbenayoun in #954
- Trainer simplification by @michaelbenayoun in #938
- Fix attention implementation argument in custom modeling by @michaelbenayoun in #963
- ZeRO-1 and mixed-precision by @michaelbenayoun in #956
- PEFT and PP by @michaelbenayoun in #964
- Fix async save by @michaelbenayoun in #976
Documentation
- docs: remove finetune with AWS guide by @michaelbenayoun in #905
- Contribute custom modeling by @michaelbenayoun in #908
- docs: remove sagemaker guide by @michaelbenayoun in #906
- Supported architectures page by @michaelbenayoun in #907
- Update cache system guide by @michaelbenayoun in #910
- [docs] Getting started page by @michaelbenayoun in #911
- [docs] Move inference API section by @michaelbenayoun in #913
- [docs] Tutorial sections by @michaelbenayoun in #914
- [docs] Update the link for the card images by @michaelbenayoun in #915
- [docs] Quickstart page by @michaelbenayoun in #912
- chore: remove doc-builder dependency from quality extra by @tengomucho in #917
- [docs] Trainers api by @michaelbenayoun in #922
- [docs] Distributed training guide by @michaelbenayoun in #921
- [docs] Transformations specs api ref by @michaelbenayoun in #923
- [docs] Lora API reference page by @michaelbenayoun in #924
- Update pipelines.mdx by @Abdennacer-Badaoui in #942
- [docs] LLama tutorial: adapt the Llama tutorial to the new format by @michaelbenayoun in #919
- Add whitepaper by @pagezyhf in #958
- Add vllm install instructions to documentation by @jimburtoft in #952
New Contributors
- @Abdennacer-Badaoui made their first contribution in #939
Full Changelog: v0.3.0...v0.4.0