29 Oct 20:16

elaineyz

26a43b5

Neuron 2.26.1 Latest

Latest

Release: 2.26.1

Neuron SDK 2.26.1 Inference + vLLM 0.9.0 Integration (V0 Architecture)

NxD Inference (NxDI) v2.26.1 is now supported on branch neuron-2.26.1 in this fork.

What's New

Minor bug fixes

Contributors

@aws-navyadhara @jimburtoft

Contributors

jimburtoft and aws-navyadhara

Assets 2

18 Sep 23:29

elaineyz

2.26.0

72a1abe

Neuron 2.26.0

Release: 2.26.0

Neuron SDK 2.26.0 Inference + vLLM 0.9.0 Integration (V0 Architecture)

NxD Inference (NxDI) v2.26.0 is now supported on branch neuron-2.26 in this fork.

What's New

Beta support for Llama 4, including Scout and Maverick models. Currently we require users to compile the model outside of vLLM and specify the compiled model path using the NEURON_COMPILED_ARTIFACTS environment variable. This limitation will be addressed in a future release.
Other minor fixes and improvements.

Contributors

@aws-bowencc @aws-yishanm @sssrijan-amazon @aarondou @elaineyz @aws-satyajith @aws-aymahg @kannakAWS @feiwx-cloud @aws-luof @yahavb @rohis06-aws

Contributors

aarondou, yahavb, and 10 other contributors

Assets 2

31 Jul 19:05

elaineyz

2.25.0

1646d10

Neuron 2.25.0

Release: 2.25.0

Neuron SDK 2.25.0 Inference + vLLM 0.9.0 Integration (V0 Architecture)

NxD Inference (NxDI) v2.25.0 is now supported on branch neuron-2.25 in this fork.

What's New

Qwen3 dense models (0.6B to 32B parameters)
Added Disaggregated Inference support for multiple prefill and multiple decode workers (xPyD)
Other minor fixes and improvements.

Contributors

@aws-bowencc @aws-yishanm @sssrijan-amazon @aarondou @aws-navyadhara @elaineyz @aws-satyajith @chongmni-aws @ethanqh-aws @rohis06-aws @aws-aymahg @shawnzxf

Contributors

aarondou, shawnzxf, and 10 other contributors

Assets 2

26 Jun 23:32

elaineyz

2.24.0

490ab20

Neuron 2.24.0

Release: 2.24.0

Neuron SDK 2.24.0 Inference + vLLM 0.7.2 Integration (V0 Architecture)

NxD Inference (NxDI) v2.24.0 is now supported on branch neuron-2.24-vllm-v0.7.2 in this fork.

What's New

Expanded model support for Qwen2.5 text models
Automatic Prefix Caching (APC) support. For more information, see NxDI Prefix Caching feature guide and tutorial
Disaggregated inference (DI) support (Beta).
Other minor fixes and improvements.

Other changes

Starting in release 2.24, the Neuron initialization in vLLM code no longer enables sequence parallel by default. This is to ensure better compatibility with models and configurations where sequence parallelism is not well supported. If you previously relied on the Neuron vLLM code to specify sequence parallel, you may now see increased TTFT times. To re-enable sequence parallelism, you can pass --override-neuron-config "{\"sequence_parallel_enabled\":true}.

Contributors

@shubhamchandak94 @aws-bowencc @AakashShetty-aws @shawnzxf @rohis06-aws @aws-yishanm @sssrijan-amazon @aws-luof @aarondou @aws-navyadhara @elaineyz @aws-satyajith @aws-cph @Zha0q1(emeritus)

Contributors

aarondou, Zha0q1, and 12 other contributors

Assets 2

02 Jun 20:02

mrinalks

2.23.0

ca2f6b9

Neuron 2.23.0

Release: 2.23.0

Neuron SDK 2.23.0 Inference + vLLM 0.9.0 Integration (V0 Architecture)

This release marks full support for [Neuron SDK 2.23.0 inference libraries] with vLLM 0.9.0 (V0 Architecture). Neuronx Distributed (NxD) Inference is the recommended path for multi-chip inference on AWS Trainium and Inferentia.

Highlights

NxD Inference (NxDI) v2.23.0 is now fully compatibility with vLLM 0.9.0 with environment variable VLLM_USE_V1=0.
Support for speculative decoding and dynamic on-device sampling for latency-optimized generation.
Expanded model support including LLaMA 3.2 multi-modal models and Multi-LoRA inference.

Contributors

@aarondou @AakashShetty-aws @aws-navyadhara @aws-satyajith @aws-tailinpa (emeritus) @aws-yishanm @chongmni-aws @elaineyz @liangfu @mrinalks @sssrijan-amazon

Contributors

liangfu, aarondou, and 9 other contributors

Assets 2

21 Aug 20:59

shawnzxf

nxd-v0.1.0

5da48c0

nxd-v0.1.0 Pre-release

Pre-release

This release supports DBRX model for NxD+vLLM as a new feature.

It is based on the vLLM v0.3.2

What's Changed

Support DBRX for NxD integration with vLLM by @shawnzxf in #1

New Contributors

@shawnzxf made their first contribution in #1

Full Changelog: https://github.com/aws-neuron/upstreaming-to-vllm/commits/nxd-v0.1.0

Contributors

shawnzxf

Assets 2

Releases: aws-neuron/upstreaming-to-vllm

Neuron 2.26.1

Release: 2.26.1

What's New

Contributors

Contributors

Uh oh!

Neuron 2.26.0

Release: 2.26.0

What's New

Contributors

Contributors

Uh oh!

Neuron 2.25.0

Release: 2.25.0

What's New

Contributors

Contributors

Uh oh!

Neuron 2.24.0

Release: 2.24.0

What's New

Other changes

Contributors

Contributors

Uh oh!

Neuron 2.23.0

Release: 2.23.0

Highlights

Contributors

Contributors

Uh oh!

nxd-v0.1.0

What's Changed

New Contributors

Contributors

Uh oh!