Skip to content

v0.6.1

Latest

Choose a tag to compare

@JimHsiung JimHsiung released this 31 Oct 02:41
a0ca5b4

Highlights

Bugfix

  • Skip cancelled requests when processing stream output.
  • Resolve segmentation fault during qwen3 quantized inference.
  • Fix the alignment of monitoring metrics format for Prometheus.
  • Clear outdated tensors to save memory when loading model weights.

Release Images

x86 image

quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hb-rc2-x86

ARM a2 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hb-rc2-arm

ARM a3 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hc-rc2-arm