Skip to content

Releases: aws-neuron/upstreaming-to-vllm

Neuron 2.26.1

29 Oct 20:16
26a43b5

Choose a tag to compare

Release: 2.26.1

Neuron SDK 2.26.1 Inference + vLLM 0.9.0 Integration (V0 Architecture)

NxD Inference (NxDI) v2.26.1 is now supported on branch neuron-2.26.1 in this fork.


What's New

  • Minor bug fixes

Contributors

@aws-navyadhara @jimburtoft

Neuron 2.26.0

18 Sep 23:29

Choose a tag to compare

Release: 2.26.0

Neuron SDK 2.26.0 Inference + vLLM 0.9.0 Integration (V0 Architecture)

NxD Inference (NxDI) v2.26.0 is now supported on branch neuron-2.26 in this fork.


What's New

  • Beta support for Llama 4, including Scout and Maverick models. Currently we require users to compile the model outside of vLLM and specify the compiled model path using the NEURON_COMPILED_ARTIFACTS environment variable. This limitation will be addressed in a future release.
  • Other minor fixes and improvements.

Contributors

@aws-bowencc @aws-yishanm @sssrijan-amazon @aarondou @elaineyz @aws-satyajith @aws-aymahg @kannakAWS @feiwx-cloud @aws-luof @yahavb @rohis06-aws

Neuron 2.25.0

31 Jul 19:05

Choose a tag to compare

Release: 2.25.0

Neuron SDK 2.25.0 Inference + vLLM 0.9.0 Integration (V0 Architecture)

NxD Inference (NxDI) v2.25.0 is now supported on branch neuron-2.25 in this fork.


What's New

  • Qwen3 dense models (0.6B to 32B parameters)
  • Added Disaggregated Inference support for multiple prefill and multiple decode workers (xPyD)
  • Other minor fixes and improvements.

Contributors

@aws-bowencc @aws-yishanm @sssrijan-amazon @aarondou @aws-navyadhara @elaineyz @aws-satyajith @chongmni-aws @ethanqh-aws @rohis06-aws @aws-aymahg @shawnzxf

Neuron 2.24.0

26 Jun 23:32

Choose a tag to compare

Release: 2.24.0

Neuron SDK 2.24.0 Inference + vLLM 0.7.2 Integration (V0 Architecture)

NxD Inference (NxDI) v2.24.0 is now supported on branch neuron-2.24-vllm-v0.7.2 in this fork.


What's New

  • Expanded model support for Qwen2.5 text models
  • Automatic Prefix Caching (APC) support. For more information, see NxDI Prefix Caching feature guide and tutorial
  • Disaggregated inference (DI) support (Beta).
  • Other minor fixes and improvements.

Other changes

  • Starting in release 2.24, the Neuron initialization in vLLM code no longer enables sequence parallel by default. This is to ensure better compatibility with models and configurations where sequence parallelism is not well supported. If you previously relied on the Neuron vLLM code to specify sequence parallel, you may now see increased TTFT times. To re-enable sequence parallelism, you can pass --override-neuron-config "{\"sequence_parallel_enabled\":true}.

Contributors

@shubhamchandak94 @aws-bowencc @AakashShetty-aws @shawnzxf @rohis06-aws @aws-yishanm @sssrijan-amazon @aws-luof @aarondou @aws-navyadhara @elaineyz @aws-satyajith @aws-cph @Zha0q1(emeritus)

Neuron 2.23.0

02 Jun 20:02
ca2f6b9

Choose a tag to compare

Release: 2.23.0

Neuron SDK 2.23.0 Inference + vLLM 0.9.0 Integration (V0 Architecture)

This release marks full support for [Neuron SDK 2.23.0 inference libraries] with vLLM 0.9.0 (V0 Architecture). Neuronx Distributed (NxD) Inference is the recommended path for multi-chip inference on AWS Trainium and Inferentia.


Highlights

  • NxD Inference (NxDI) v2.23.0 is now fully compatibility with vLLM 0.9.0 with environment variable VLLM_USE_V1=0.
  • Support for speculative decoding and dynamic on-device sampling for latency-optimized generation.
  • Expanded model support including LLaMA 3.2 multi-modal models and Multi-LoRA inference.

Contributors

@aarondou @AakashShetty-aws @aws-navyadhara @aws-satyajith @aws-tailinpa (emeritus) @aws-yishanm @chongmni-aws @elaineyz @liangfu @mrinalks @sssrijan-amazon

nxd-v0.1.0

21 Aug 20:59
5da48c0

Choose a tag to compare

nxd-v0.1.0 Pre-release
Pre-release

This release supports DBRX model for NxD+vLLM as a new feature.

It is based on the vLLM v0.3.2

What's Changed

  • Support DBRX for NxD integration with vLLM by @shawnzxf in #1

New Contributors

Full Changelog: https://github.com/aws-neuron/upstreaming-to-vllm/commits/nxd-v0.1.0