Release v0.4.0: AWS Neuron SDK 2.6, Trainium 2 support, Qwen3-MoE, Llama4 (text) · huggingface/optimum-neuron

What's Changed

Add Flux Inpaint support by @JingyaHuang in #909
chore: Bump diffusers to 0.35.* by @JingyaHuang in #935
fix: flux neuron cache detection by @JingyaHuang in #937
Add flux inpaint to supported by @JingyaHuang in #932
feat: Support flux kontext for text2img by @jlonge4 in #916
allow safetensors to be downloaded for flux by @Abdennacer-Badaoui in #939
Add support for SmolLM3 models by @dacorvo in #934
Add support for Qwen3Moe models by @dacorvo in #945
Cleanup inference backend modules by @dacorvo in #948
Remove unsupported modeling flags by @dacorvo in #950
Remove optimized model dependency in LLM models by @dacorvo in #955
Add tests for modules used in inference by @tengomucho in #957
Add support for text generation in Llama4 models by @dacorvo in #959
test(inference): add tests to check decoder layer accuracy by @tengomucho in #962
Add vLLM docker image by @dacorvo in #967
Add trn1 vllm llama benchmark by @dacorvo in #970
Enable CPU compilation by @Abdennacer-Badaoui in #961
Enable instance_type tag to export by @JingyaHuang in #974
Add trn2 benchmarks for a few big models by @dacorvo in #991
Improve DX when exporting and deploying LLM neuron models by @dacorvo in #986
ECR Image URI retrieval by @tengomucho in #985
Add support for Trainium 2 for decoder models by @dacorvo in #988
Automatically detect platform when serving models by @dacorvo in #989
Add trn1 qwen3 and llama4 moe vLLM benchmark by @dacorvo in #973

Trainers refactor by @michaelbenayoun in #918
fix: uses processing_class instead of tokenizer in base trainer by @michaelbenayoun in #927
fix: fixes barrier issue at the end of training with hub sync by @michaelbenayoun in #925
Sync training custom modeling to transformers=4.55.4 by @michaelbenayoun in #954
Trainer simplification by @michaelbenayoun in #938
Fix attention implementation argument in custom modeling by @michaelbenayoun in #963
ZeRO-1 and mixed-precision by @michaelbenayoun in #956
PEFT and PP by @michaelbenayoun in #964
Fix async save by @michaelbenayoun in #976

docs: remove finetune with AWS guide by @michaelbenayoun in #905
Contribute custom modeling by @michaelbenayoun in #908
docs: remove sagemaker guide by @michaelbenayoun in #906
Supported architectures page by @michaelbenayoun in #907
Update cache system guide by @michaelbenayoun in #910
[docs] Getting started page by @michaelbenayoun in #911
[docs] Move inference API section by @michaelbenayoun in #913
[docs] Tutorial sections by @michaelbenayoun in #914
[docs] Update the link for the card images by @michaelbenayoun in #915
[docs] Quickstart page by @michaelbenayoun in #912
chore: remove doc-builder dependency from quality extra by @tengomucho in #917
[docs] Trainers api by @michaelbenayoun in #922
[docs] Distributed training guide by @michaelbenayoun in #921
[docs] Transformations specs api ref by @michaelbenayoun in #923
[docs] Lora API reference page by @michaelbenayoun in #924
Update pipelines.mdx by @Abdennacer-Badaoui in #942
[docs] LLama tutorial: adapt the Llama tutorial to the new format by @michaelbenayoun in #919
Add whitepaper by @pagezyhf in #958
Add vllm install instructions to documentation by @jimburtoft in #952

Full Changelog: v0.3.0...v0.4.0