bump version to v0.10.2 (#4062)

lvhan028 · web-flow · commit f36aa716e9ba · 2025-10-28T19:31:18.000+08:00
* bump version to v0.10.2

* fix side effect
diff --git a/README.md b/README.md
@@ -212,7 +212,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
 For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**
 
 ```shell
-export LMDEPLOY_VERSION=0.10.1
+export LMDEPLOY_VERSION=0.10.2
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -213,7 +213,7 @@ pip install lmdeploy
 若使用 GeForce RTX 50 系列显卡，请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。
 
 ```shell
-export LMDEPLOY_VERSION=0.10.1
+export LMDEPLOY_VERSION=0.10.2
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/benchmark/profile_throughput.py b/benchmark/profile_throughput.py
@@ -163,7 +163,7 @@ async def _inference(self, req_queue: Queue, session_id: int, temperature: float
 
             state = DetokenizeState(len(input_ids))
 
-            prev_len = 0
+            n_token = 0
             token_ids = input_ids.copy()
 
             generator = model_inst.async_stream_infer(session_id,
@@ -178,15 +178,13 @@ async def _inference(self, req_queue: Queue, session_id: int, temperature: float
                                                       stream_output=stream_output)
             try:
                 async for outputs in generator:
-                    n_token = outputs.num_token
-                    if n_token > prev_len:
-                        token_ids += outputs.token_ids[prev_len - n_token:]
-                        if not skip_detokenize:
-                            _, state = self.tokenizer.detokenize_incrementally(token_ids, state)
-                        sess.tick(n_token)
-                        prev_len = n_token
-                        if n_token > cancel_after:
-                            break
+                    n_token += outputs.num_token
+                    token_ids += outputs.token_ids
+                    if not skip_detokenize:
+                        _, state = self.tokenizer.detokenize_incrementally(token_ids, state)
+                    sess.tick(n_token)
+                    if n_token > cancel_after:
+                        break
                 sess.finish(Session.SUCCESS)
             finally:
                 await generator.aclose()
diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.10.1
+export LMDEPLOY_VERSION=0.10.2
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
@@ -51,7 +51,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git
 If you prefer a specific version instead of the `main` branch of LMDeploy, you can specify it in your command:
 
 ```shell
-pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.1.zip
+pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.2.zip
 ```
 
 If you want to build LMDeploy with support for Ascend, Cambricon, or MACA, install LMDeploy with the corresponding `LMDEPLOY_TARGET_DEVICE` environment variable.
diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3)，你可以使用以下命令安装 lmdeploy：
 
 ```shell
-export LMDEPLOY_VERSION=0.10.1
+export LMDEPLOY_VERSION=0.10.2
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
@@ -51,7 +51,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git
 如果您希望使用特定版本，而不是 LMDeploy 的 `main` 分支，可以在命令行中指定：
 
 ```shell
-pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.1.zip
+pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.2.zip
 ```
 
 如果您希望构建支持昇腾、寒武纪或沐熙的 LMDeploy，请使用相应的 `LMDEPLOY_TARGET_DEVICE` 环境变量进行安装。
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.10.1'
+__version__ = '0.10.2'
 short_version = __version__