Releases: huggingface/optimum-neuron
v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching
What's Changed
Training
- Integrate new cache system for training by @michaelbenayoun in #472
TGI
- Support higher batch sizes using transformers-neuronx continuous batching by @dacorvo in #488
- Lift max-concurrent-request limitation usingTGI 1.4.1 by @dacorvo in #488
AMI
- Add packer support for building AWS AMI by @shub-kris in #441
- [AMI] Updates base ami to new id by @philschmid in #482
Major bugfixes
- Fix sdxl inpaint pipeline for diffusers 0.26.* by @JingyaHuang in #458
- TGI: update to controller version 1.4.0 & bug fixes by @dacorvo in #470
- Fix optimum-cli export for inf1 by @JingyaHuang in #474
Other changes
- Add TGI tests and CI workflow by @dacorvo in #355
- Bump to optimum 1.17 - Adapt to optimum exporter refactoring by @JingyaHuang in #414
- [Training] Support for Transformers 4.37 by @michaelbenayoun in #459
- Add contribution guide for Neuron exporter by @JingyaHuang in #461
- Fix path, update versions by @shub-kris in #462
- Add issue and PR templates & build optimum env cli for Neuron by @JingyaHuang in #463
- Fix trigger for actions by @philschmid in #468
- TGI: bump rust version by @dacorvo in #477
- [documentation] Add Container overview page. by @philschmid in #481
- Bump to Neuron sdk 2.17.0 by @JingyaHuang in #487
New Contributors
- @shub-kris made their first contribution in #441
Full Changelog: v0.0.18...v0.0.19
v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training
What's Changed
AWS SDK
- Use AWS Neuron SDK 2.16.1 (#449)
Inference
- Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
- Allow exporting decoder models using optimum-cli by @dacorvo (#422)
- Add Neuron X cache registry by @dacorvo (#442)
- Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)
Training
- Initial support for pipeline parallelism by @michaelbenayoun (#279)
TGI
Tutorials and doc improvement
- Various fixes by @jimburtoft @michaelbenayoun @JingyaHuang (#428 #429 #432)
- Improve Stable Diffusion Notebooks by @JingyaHuang (#431)
- Add Sentence Transformers Guide and Notebook by @philschmid (#434)
- Add benchmark section by @dacorvo (#435)
Major bugfixes
- TGI: correctly identify special tokens during generation by @dacorvo (#438)
- TGI: do not include the input_text in generated text by @dacorvo (#454)
Other changes
- API change to be compatible to Optimum by @JingyaHuang (#421)
New Contributors
- @jimburtoft made their first contribution in #432
Full Changelog: v0.0.17...v0.0.18
v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache
What's Changed
AWS SDK
- Use AWS Neuron SDK 2.16 (#398)
- Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)
Inference
- Improve the support of sentence transformers by @JingyaHuang (#408)
- Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
- Add support for Mistral models by @dacorvo (#411)
- Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)
Training
- Add general support for generation on TRN with NxD by @aws-tianquaw (#370)
Tutorials and doc improvement
- Add llama 2 fine tuning tutorial by @philschmid (#390)
Major bugfixes
- Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)
Other changes
- Bump Hugging Face library versions by @JingyaHuang (#403)
New Contributors
- @aws-tianquaw made their first contribution in #370
- @aws-yishanm made their first contribution in #387
Full Changelog: v0.0.16...v0.0.17
v0.0.16: T5 export and inference, general training fixes
What's Changed
Training
A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.
- Skip model saving during precompilation and provide option to skip cache push (#365)
- Fixes checkpoint saving and consolidtation for TP (#378)
- A
torch_xlacompatible version ofsafetensors.torch.save_fileis now used in theNeuronTrainer(#329)
Inference
v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK
What's Changed
Training
Distributed Training
parallel_cross_entropyloss support for tensor parallelism (#246)- Support for training the Mistral architecture with tensor parallelism (#303)
AWS SDK
- Fix:
neuron_parallel_compileis compatible with the cache system (#352) - Full support for
neuron_parallel_compilewith the cache system: compilation files produced byneuron_parallel_compilewill be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)
Documentation
Inference
- Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
- Support decoding sequences of byte tokens in TGI (#350)
Documentation
- Updated the documentation on LCM (#351)
v0.0.14: LCM support
What's Changed
LCM support
- [Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323
Tutorials and doc improvement
- notebooks: add llama2 chatbot example by @dacorvo in #300
- Add llama 2 tutorial by @dacorvo in #321
- Migrate documentation of Stable Diffusion and add notebooks by @JingyaHuang in #312
Major bugfixes
- Noisy loss fix by @bocchris-aws in #293
- Fix neuron cache starting compilation before fetching by @michaelbenayoun in #280
- fix(pipelines): support passing decoder model + tokenizer by @dacorvo in #319
Other changes
- chore: update dev version by @dacorvo in #276
- Explicitly mention aws repo extra url in documentation by @dacorvo in #277
- Update supported architecture in the doc by @JingyaHuang in #281
- Fix doc build source code broken links by @JingyaHuang in #282
- Add revision to push_to_hub by @philschmid in #292
- Set default device id for SD and SDXL by @JingyaHuang in #297
- Add missing decoder model architectures by @dacorvo in #298
- Official support for AWS inferentia2 TGI container by @dacorvo in #302
- Transformers fix by @dacorvo in #320
- Add sagemaker compatible image by @dacorvo in #322
- Fix broken tests by @michaelbenayoun in #274
- chore: align with AWS Neuron SDK 2.15.1 by @dacorvo in #325
- Deleted the 'maybe_free_model_hooks()' from Diffusers Pipelines by @Cerrix in #330
- Bump diffusers version by @JingyaHuang in #335
New Contributors
Full Changelog: v0.0.13...v0.0.14
v0.0.13: AWS Neuron SDK 2.15
What's Changed
The main change in this release is the alignment with AWS Neuron SDK 2.15.
Text-generation
Other changes
- Use attention masks for TGI generation by @dacorvo in #264
- Various fixes for TP by @michaelbenayoun in #260
- Fix neuron pipelines by @dacorvo in #265
- Fix #241 by @michaelbenayoun in #268
- Fixes generation during the evaluation step by @michaelbenayoun in #266
- Save / load from checkpoint TP by @michaelbenayoun in #269
Full Changelog: v0.0.12...v0.0.13
v0.0.12.1: Patch release for training with Neuron SDK 2.14
v0.0.12: SDXL refiner, Sequence parallelism training
What's Changed
Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support
- [Stable Diffusion] Image2image and inpaint pipeline support by @JingyaHuang in #161
- [SDXL] Add SDXL image to image support by @JingyaHuang in #239
Distributed Training:
- Sequence parallelism by @michaelbenayoun in #233
- Parallelism support for GPTNeoX by @michaelbenayoun in #244
Text generation updates
Other changes
- TGI stability fixes by @dacorvo in #226
- Remove experimental compilation flag for text-generation models by @dacorvo in #228
- Patch for diffusers 0.21.0 release by @JingyaHuang in #229
- test_examples uses ExampleRunner by @michaelbenayoun in #227
- Using the real model name instead of hard code "model" by @davidshtian in #231
- Replace transformers list of logits warpers by a fused logic warper by @dacorvo in #234
- Use AWS Neuron SDK 2.14 by @dacorvo in #236
- Weight loading after lazy loading fix by @michaelbenayoun in #238
- Add
debugattribute toNeuronPartialStateby @michaelbenayoun in #240 - Update
tests/test_examples.pyfor AWS team by @michaelbenayoun in #242 - Rework text-generation example by @dacorvo in #245
- Fix evaluation recompilation issue by @michaelbenayoun in #248
- test(generation): specify revision for hub test model by @dacorvo in #250
- Add sequence length for generative models and llama tests by @dacorvo in #251
- Fix noisy loss for T5 when doing TP by @michaelbenayoun in #257
- Fix bug with transformers 4.34 by @michaelbenayoun in #259
New Contributors
- @davidshtian made their first contribution in #231
Full Changelog: v0.0.11...v0.0.12
v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI
SDXL Export and Inference
Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).
Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge or larger recommended) or a CPU-only instance (disable the validation with --disable-validation) :
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/And then run inference with the class NeuronStableDiffusionXLPipeline
from optimum.neuron import NeuronStableDiffusionXLPipeline
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]- Add sdxl exporter support by @JingyaHuang in #203
- Add Stable Diffusion XL inference support by @JingyaHuang in #212
Llama v1, v2 Inference
Llama v2 Training
- Llama V2 training support by @michaelbenayoun in #211
- LLama V1 training fix by @michaelbenayoun in #211
TGI
Major bugfixes
neuron_parallel_compile,ParallelLoaderand Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200- flan-t5 fix:
T5Parallelizer,NeuronCacheCallbackandNeuronHashrefactors by @michaelbenayoun in #207 - Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217
Other changes
- Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
- Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
- Cancel concurreny CIs for inference by @JingyaHuang in #218
- fix(tgi): typer does not support Union types by @dacorvo in #219
- Bump neuron-cc version to 1.18.* by @JingyaHuang in #224
Full Changelog: v0.0.10...v0.0.11