Skip to content

Releases: huggingface/optimum-neuron

v0.4.5: serving LLM Embeddings models

11 Feb 10:28

Choose a tag to compare

What's Changed

  • doc: add a guide to explain how vLLM deployment on Inference Endpoints by @tengomucho in #1057
  • Add Qwen embedding guide and notebook by @pinak-p in #1045
  • Serve embedding models using vLLM by @dacorvo in #1072

Other changes

New Contributors

Full Changelog: v0.4.4...v0.4.5

v0.4.4: improved vLLM perf with on-device-sampling disable, fix speculation algo, PEFT update for GRPO

12 Jan 10:14

Choose a tag to compare

What's Changed

Inference

Training

Other

Full Changelog: v0.4.3...v0.4.4

v0.4.3: fix for on Llama4, device memory usage details, vLLM container accepts params

10 Dec 16:23

Choose a tag to compare

What's Changed

Inference

Other

Full Changelog: v0.4.2...v0.4.3

v0.4.2: Training cache fixes, Qwen3 Embedding support added, vLLM v1 API

20 Nov 10:30

Choose a tag to compare

What's Changed

Inference

Training

Other

Full Changelog: v0.4.1...v0.4.2

v0.4.1: Xet High Performance transfers, vLLM served model name

23 Oct 16:06

Choose a tag to compare

What's Changed

  • chore: bump huggingface_hub version, set HF_XET_HIGH_PERFORMANCE by @tengomucho in #998

Inference

Training

Documentation:

Full Changelog: v0.4.0...v0.4.1

v0.4.0: AWS Neuron SDK 2.6, Trainium 2 support, Qwen3-MoE, Llama4 (text)

10 Oct 10:05

Choose a tag to compare

What's Changed

Inference

Training

Documentation

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0: vLLM plugin, FLUX support, SDK 2.24

18 Jul 12:20

Choose a tag to compare

What's Changed

Inference

Training

General

Documentation

Full Changelog: v0.2.2...v0.3.0

release: 0.2.2 - Fix LLM inference modeling

01 Jul 15:57

Choose a tag to compare

What's Changed

The LLM inference code led to compilation error for models whose head_dim is not equal to hidden_size // num_attention_heads like Qwen3-0.6B and Qwen3-32B.

  • [Inference] Fix head_dim usage in modeling by @dacorvo in #879

Full Changelog: v0.2.1...v0.2.2

v0.2.1: NxD refactoring

27 Jun 07:58

Choose a tag to compare

What's Changed

Inference

Training

General

  • feat: add a tool to decode binary HLOs by @dacorvo in #870

Documentation

New Contributors

Full Changelog: v0.2.0...v0.2.1

v0.2.0

06 Jun 15:03

Choose a tag to compare

What's Changed

Inference

Training

Documentation

Bug fixes

New Contributors

Full Changelog: v0.1.0...v0.2.0