release: update xllm release version to v0.7.0.

JimHsiung · yq33victor · commit 53b6e6f4f3f5 · 2025-11-06T17:56:05.000+08:00
diff --git a/RELEASE.md b/RELEASE.md
@@ -1,3 +1,39 @@
+# Release xllm 0.7.0
+
+## **Major Features and Improvements**
+
+### Model Support
+
+- Support GLM-4.5.
+- Support Qwen3-Embedding.
+- Support Qwen3-VL.
+- Support FluxFill.
+
+### Feature
+- Support MLU backend, currently supports Qwen3 series models.
+- Support dynamic disaggregated PD, with dynamic switching between P and D phases based on strategy.
+- Support multi-stream parallel overlap optimization.
+- Support beam-search capability in generative models.
+- Support virtual memory continuous kv-cache capability.
+- Support ACL graph executor.
+- Support unified online-offline co-location scheduling in disaggregated PD scenarios.
+- Support PrefillOnly Scheduler.
+- Support v1/rerank model service interface.
+- Support communication between devices via shared memory instead of RPC on a single machine.
+- Support function call.
+- Support reasoning output in chat interface.
+- Support top-k+add fusion in the router component of MoE models.
+- Support offline inference for LLM, VLM, and Embedding models.
+- Optimized certain runtime performance.
+
+### Bugfix
+- Skip cancelled requests when processing stream output.
+- Resolve segmentation fault during qwen3 quantized inference.
+- Fix the alignment of monitoring metrics format for Prometheus.
+- Clear outdated tensors to save memory when loading model weights.
+- Fix attention mask to support long sequence requests.
+- Fix bugs caused by enabling scheduler overlap.
+
 # Release xllm 0.6.0
 
 ## **Major Features and Improvements**
diff --git a/version.txt b/version.txt
@@ -1 +1 @@
-0.6.0
+0.7.0