Skip to content

v0.12.0

Latest

Choose a tag to compare

@RobMulla RobMulla released this 12 Dec 20:28
· 86 commits to main since this release

This release brings several new features and improvements for vLLM TPU Inference.

Highlights

Async Scheduler Enabled the async-scheduler in tpu-inference for improved performance on smaller models.

Spec Decoder EAGLE-3 Added support EAGLE-3 variant with verified performance for Llama 3.1-8B.

Out-of-Tree Model Support Load custom JAX models as plugins, enabling users to serve custom model architectures without forking or modifying vLLM internals.

Automated CI/CD and Pre-merge Check Improved the testing and validation pipeline with automated CI/CD and pre-merge checks to enhance stability and accelerate iteration. More improvements to come.

What's Changed

New Contributors

Full Changelog: v0.11.1...v0.12.0