v0.3.0: vLLM plugin, FLUX support, SDK 2.24
What's Changed
- chore: bump aws neuron sdk version to 2.24.0 by @JingyaHuang in #856
- Add BlackForest Flux Support by @JingyaHuang in #815
Inference
- [LLM] Reenable on device sampling for (almost) all configurations by @dacorvo in #886
- Add vLLM plugin by @dacorvo in #888
- Move
NEURON_FUSE_SOFTMAXandNEURON_CUSTOM_SILUenv var to diffusers model loading by @JingyaHuang in #889 - Update LLM benchmarks by @dacorvo in #895
- Bump accelerate to 1.3.0 + peft to 0.15.2+diffusers>=0.31.0 by @JingyaHuang in #901
- chore: move inference modeling code by @JingyaHuang in #902
Training
- Few inference fixes by @tengomucho in #880
- Auto model classes for custom modeling by @michaelbenayoun in #883
- Finetune llm example by @michaelbenayoun in #894
General
- Remove
is_torch_xla_availableandis_neuronx_availableby @michaelbenayoun in #884 - Type hint cleaning by @michaelbenayoun in #887
Documentation
- doc(vllm): change reco for models that are not cached by @dacorvo in #899
- Remove example scripts by @michaelbenayoun in #893
- ci: align doc workflow on doc-pr by @dacorvo in #896
- Update README by @michaelbenayoun in #900
- Benchmark on TGI + optimum-neuron by @jlonge4 in #904
Full Changelog: v0.2.2...v0.3.0