Highlights
Bugfix
- Skip cancelled requests when processing stream output.
- Resolve segmentation fault during qwen3 quantized inference.
- Fix the alignment of monitoring metrics format for Prometheus.
- Clear outdated tensors to save memory when loading model weights.
Release Images
x86 image
quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hb-rc2-x86
ARM a2 device image
quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hb-rc2-arm
ARM a3 device image
quay.io/jd_xllm/xllm-ai:xllm-0.6.1-release-hc-rc2-arm