v0.3.0: vLLM plugin, FLUX support, SDK 2.24

tengomucho released this 18 Jul 12:20

759b5eb

What's Changed

chore: bump aws neuron sdk version to 2.24.0 by @JingyaHuang in #856
Add BlackForest Flux Support by @JingyaHuang in #815

Inference

[LLM] Reenable on device sampling for (almost) all configurations by @dacorvo in #886
Add vLLM plugin by @dacorvo in #888
Move NEURON_FUSE_SOFTMAX and NEURON_CUSTOM_SILU env var to diffusers model loading by @JingyaHuang in #889
Update LLM benchmarks by @dacorvo in #895
Bump accelerate to 1.3.0 + peft to 0.15.2+diffusers>=0.31.0 by @JingyaHuang in #901
chore: move inference modeling code by @JingyaHuang in #902

Training

Few inference fixes by @tengomucho in #880
Auto model classes for custom modeling by @michaelbenayoun in #883
Finetune llm example by @michaelbenayoun in #894

General

Remove is_torch_xla_available and is_neuronx_available by @michaelbenayoun in #884
Type hint cleaning by @michaelbenayoun in #887

Documentation

doc(vllm): change reco for models that are not cached by @dacorvo in #899
Remove example scripts by @michaelbenayoun in #893
ci: align doc workflow on doc-pr by @dacorvo in #896
Update README by @michaelbenayoun in #900
Benchmark on TGI + optimum-neuron by @jlonge4 in #904

Full Changelog: v0.2.2...v0.3.0

Contributors

dacorvo, tengomucho, and 3 other contributors

Assets 2