Releases · huggingface/optimum-neuron

19 Feb 15:48

JingyaHuang

v0.0.19

e908192

v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching

What's Changed

Training

Integrate new cache system for training by @michaelbenayoun in #472

TGI

Support higher batch sizes using transformers-neuronx continuous batching by @dacorvo in #488
Lift max-concurrent-request limitation usingTGI 1.4.1 by @dacorvo in #488

AMI

Add packer support for building AWS AMI by @shub-kris in #441
[AMI] Updates base ami to new id by @philschmid in #482

Major bugfixes

Fix sdxl inpaint pipeline for diffusers 0.26.* by @JingyaHuang in #458
TGI: update to controller version 1.4.0 & bug fixes by @dacorvo in #470
Fix optimum-cli export for inf1 by @JingyaHuang in #474

Other changes

Add TGI tests and CI workflow by @dacorvo in #355
Bump to optimum 1.17 - Adapt to optimum exporter refactoring by @JingyaHuang in #414
[Training] Support for Transformers 4.37 by @michaelbenayoun in #459
Add contribution guide for Neuron exporter by @JingyaHuang in #461
Fix path, update versions by @shub-kris in #462
Add issue and PR templates & build optimum env cli for Neuron by @JingyaHuang in #463
Fix trigger for actions by @philschmid in #468
TGI: bump rust version by @dacorvo in #477
[documentation] Add Container overview page. by @philschmid in #481
Bump to Neuron sdk 2.17.0 by @JingyaHuang in #487

New Contributors

@shub-kris made their first contribution in #441

Full Changelog: v0.0.18...v0.0.19

Contributors

dacorvo, shub-kris, and 3 other contributors

Assets 2

01 Feb 10:18

dacorvo

v0.0.18

7b18de9

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16.1 (#449)

Inference

Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
Allow exporting decoder models using optimum-cli by @dacorvo (#422)
Add Neuron X cache registry by @dacorvo (#442)
Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)

Training

Initial support for pipeline parallelism by @michaelbenayoun (#279)

TGI

TGI: support vanilla transformer models whose configuration is cached by @dacorvo (#445)

Tutorials and doc improvement

Various fixes by @jimburtoft @michaelbenayoun @JingyaHuang (#428 #429 #432)
Improve Stable Diffusion Notebooks by @JingyaHuang (#431)
Add Sentence Transformers Guide and Notebook by @philschmid (#434)
Add benchmark section by @dacorvo (#435)

Major bugfixes

TGI: correctly identify special tokens during generation by @dacorvo (#438)
TGI: do not include the input_text in generated text by @dacorvo (#454)

Other changes

API change to be compatible to Optimum by @JingyaHuang (#421)

New Contributors

@jimburtoft made their first contribution in #432

Full Changelog: v0.0.17...v0.0.18

Contributors

dacorvo, michaelbenayoun, and 3 other contributors

Assets 2

19 Jan 07:19

dacorvo

v0.0.17

8d4b6dc

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16 (#398)
Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)

Inference

Improve the support of sentence transformers by @JingyaHuang (#408)
Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
Add support for Mistral models by @dacorvo (#411)
Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)

Training

Add general support for generation on TRN with NxD by @aws-tianquaw (#370)

Tutorials and doc improvement

Add llama 2 fine tuning tutorial by @philschmid (#390)

Major bugfixes

Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)

Other changes

Bump Hugging Face library versions by @JingyaHuang (#403)

New Contributors

@aws-tianquaw made their first contribution in #370
@aws-yishanm made their first contribution in #387

Full Changelog: v0.0.16...v0.0.17

Contributors

dacorvo, michaelbenayoun, and 4 other contributors

Assets 2

19 Dec 13:29

michaelbenayoun

v0.0.16

c0c1fc8

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

Skip model saving during precompilation and provide option to skip cache push (#365)
Fixes checkpoint saving and consolidtation for TP (#378)
A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

Support for the export and inference of T5 (#267)
New documentation for Stable Diffusion XL Turbo (#374)

Assets 2

24 Nov 17:46

michaelbenayoun

v0.0.15

3f88322

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

What's Changed

Training

Distributed Training

parallel_cross_entropy loss support for tensor parallelism (#246)
Support for training the Mistral architecture with tensor parallelism (#303)

AWS SDK

Fix: neuron_parallel_compile is compatible with the cache system (#352)
Full support for neuron_parallel_compile with the cache system: compilation files produced by neuron_parallel_compile will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)

Documentation

Guide explaining how distributed training works in optimum-neuron (#339)

Inference

Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
Support decoding sequences of byte tokens in TGI (#350)

Documentation

Updated the documentation on LCM (#351)

Assets 2

17 Nov 16:38

JingyaHuang

v0.0.14

d65449e

v0.0.14: LCM support

What's Changed

LCM support

[Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323

Tutorials and doc improvement

notebooks: add llama2 chatbot example by @dacorvo in #300
Add llama 2 tutorial by @dacorvo in #321
Migrate documentation of Stable Diffusion and add notebooks by @JingyaHuang in #312

Major bugfixes

Noisy loss fix by @bocchris-aws in #293
Fix neuron cache starting compilation before fetching by @michaelbenayoun in #280
fix(pipelines): support passing decoder model + tokenizer by @dacorvo in #319

Other changes

chore: update dev version by @dacorvo in #276
Explicitly mention aws repo extra url in documentation by @dacorvo in #277
Update supported architecture in the doc by @JingyaHuang in #281
Fix doc build source code broken links by @JingyaHuang in #282
Add revision to push_to_hub by @philschmid in #292
Set default device id for SD and SDXL by @JingyaHuang in #297
Add missing decoder model architectures by @dacorvo in #298
Official support for AWS inferentia2 TGI container by @dacorvo in #302
Transformers fix by @dacorvo in #320
Add sagemaker compatible image by @dacorvo in #322
Fix broken tests by @michaelbenayoun in #274
chore: align with AWS Neuron SDK 2.15.1 by @dacorvo in #325
Deleted the 'maybe_free_model_hooks()' from Diffusers Pipelines by @Cerrix in #330
Bump diffusers version by @JingyaHuang in #335

New Contributors

@Cerrix made their first contribution in #330

Full Changelog: v0.0.13...v0.0.14

Contributors

dacorvo, Cerrix, and 4 other contributors

Assets 2

27 Oct 09:08

dacorvo

v0.0.13

cf97838

v0.0.13: AWS Neuron SDK 2.15

What's Changed

The main change in this release is the alignment with AWS Neuron SDK 2.15.

Text-generation

add support for bloom and opt models by @dacorvo in #275

Other changes

Use attention masks for TGI generation by @dacorvo in #264
Various fixes for TP by @michaelbenayoun in #260
Fix neuron pipelines by @dacorvo in #265
Fix #241 by @michaelbenayoun in #268
Fixes generation during the evaluation step by @michaelbenayoun in #266
Save / load from checkpoint TP by @michaelbenayoun in #269

Full Changelog: v0.0.12...v0.0.13

Contributors

dacorvo and michaelbenayoun

Assets 2

27 Oct 14:08

JingyaHuang

v0.0.12.1

fe11ccf

v0.0.12.1: Patch release for training with Neuron SDK 2.14

Major bugfixes

Fix #241 by @michaelbenayoun in #268

Full Changelog: v0.0.12...v0.0.12.1

Contributors

michaelbenayoun

Assets 2

16 Oct 08:42

JingyaHuang

v0.0.12

78c2c12

v0.0.12: SDXL refiner, Sequence parallelism training

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

[Stable Diffusion] Image2image and inpaint pipeline support by @JingyaHuang in #161
[SDXL] Add SDXL image to image support by @JingyaHuang in #239

Distributed Training:

Sequence parallelism by @michaelbenayoun in #233
Parallelism support for GPTNeoX by @michaelbenayoun in #244

Text generation updates

Add text generation pipeline by @dacorvo in #258

Other changes

TGI stability fixes by @dacorvo in #226
Remove experimental compilation flag for text-generation models by @dacorvo in #228
Patch for diffusers 0.21.0 release by @JingyaHuang in #229
test_examples uses ExampleRunner by @michaelbenayoun in #227
Using the real model name instead of hard code "model" by @davidshtian in #231
Replace transformers list of logits warpers by a fused logic warper by @dacorvo in #234
Use AWS Neuron SDK 2.14 by @dacorvo in #236
Weight loading after lazy loading fix by @michaelbenayoun in #238
Add debug attribute to NeuronPartialState by @michaelbenayoun in #240
Update tests/test_examples.py for AWS team by @michaelbenayoun in #242
Rework text-generation example by @dacorvo in #245
Fix evaluation recompilation issue by @michaelbenayoun in #248
test(generation): specify revision for hub test model by @dacorvo in #250
Add sequence length for generative models and llama tests by @dacorvo in #251
Fix noisy loss for T5 when doing TP by @michaelbenayoun in #257
Fix bug with transformers 4.34 by @michaelbenayoun in #259

New Contributors

@davidshtian made their first contribution in #231

Full Changelog: v0.0.11...v0.0.12

Contributors

dacorvo, davidshtian, and 2 other contributors

Assets 2

12 Sep 13:50

JingyaHuang

v0.0.11

608f869

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

SDXL Export and Inference

Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).

Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge or larger recommended) or a CPU-only instance (disable the validation with --disable-validation) :

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/

And then run inference with the class NeuronStableDiffusionXLPipeline

from optimum.neuron import NeuronStableDiffusionXLPipeline

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]

Add sdxl exporter support by @JingyaHuang in #203
Add Stable Diffusion XL inference support by @JingyaHuang in #212

Llama v1, v2 Inference

Add support for Llama inference through NeuronModelForCausalLM by @dacorvo in #223

Llama v2 Training

Llama V2 training support by @michaelbenayoun in #211
LLama V1 training fix by @michaelbenayoun in #211

TGI

AWS Inferentia2 TGI server by @dacorvo in #214

Major bugfixes

neuron_parallel_compile, ParallelLoader and Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200
flan-t5 fix: T5Parallelizer, NeuronCacheCallback and NeuronHash refactors by @michaelbenayoun in #207
Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217

Other changes

Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
Cancel concurreny CIs for inference by @JingyaHuang in #218
fix(tgi): typer does not support Union types by @dacorvo in #219
Bump neuron-cc version to 1.18.* by @JingyaHuang in #224

Full Changelog: v0.0.10...v0.0.11

Contributors

dacorvo, michaelbenayoun, and JingyaHuang

Assets 2

Releases: huggingface/optimum-neuron

v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching

What's Changed

Training

TGI

AMI

Major bugfixes

Other changes

New Contributors

Contributors

Uh oh!

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

What's Changed

AWS SDK

Inference

Training

TGI

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

Uh oh!

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

What's Changed

AWS SDK

Inference

Training

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

Uh oh!

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

Inference

Uh oh!

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

What's Changed

Training

Distributed Training

AWS SDK

Documentation

Inference

Documentation

Uh oh!

v0.0.14: LCM support

What's Changed

LCM support

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

Uh oh!

v0.0.13: AWS Neuron SDK 2.15

What's Changed

Text-generation

Other changes

Contributors

Uh oh!

v0.0.12.1: Patch release for training with Neuron SDK 2.14

Major bugfixes

Contributors

Uh oh!

v0.0.12: SDXL refiner, Sequence parallelism training

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

Distributed Training:

Text generation updates

Other changes

New Contributors

Contributors

Uh oh!

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

SDXL Export and Inference

Llama v1, v2 Inference

Llama v2 Training