opendatalab
diff --git a/‎README.md‎
Lines changed: 9 additions & 0 deletions b/‎README.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎README_zh-CN.md‎
Lines changed: 11 additions & 0 deletions b/‎README_zh-CN.md‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎docker/china/npu.Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎docker/china/npu.Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zh/usage/acceleration_cards/Ascend.md‎
Lines changed: 31 additions & 4 deletions b/‎docs/zh/usage/acceleration_cards/Ascend.md‎
Lines changed: 31 additions & 4 deletions
diff --git a/‎docs/zh/usage/acceleration_cards/METAX.md‎
Lines changed: 4 additions & 3 deletions b/‎docs/zh/usage/acceleration_cards/METAX.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎mineru/backend/pipeline/pipeline_analyze.py‎
Lines changed: 15 additions & 20 deletions b/‎mineru/backend/pipeline/pipeline_analyze.py‎
Lines changed: 15 additions & 20 deletions
diff --git a/‎mineru/backend/vlm/utils.py‎
Lines changed: 8 additions & 12 deletions b/‎mineru/backend/vlm/utils.py‎
Lines changed: 8 additions & 12 deletions
diff --git a/‎mineru/cli/client.py‎
Lines changed: 4 additions & 5 deletions b/‎mineru/cli/client.py‎
Lines changed: 4 additions & 5 deletions
@@ -44,6 +44,13 @@
 </div>
 
 # Changelog
+
+- 2025/12/02 2.6.6 Release
+  - `mineru-api` tool optimizations
+    - Added descriptive text to `mineru-api` interface parameters to improve API documentation readability.
+    - You can use the environment variable `MINERU_API_ENABLE_FASTAPI_DOCS` to control whether the auto-generated interface documentation page is enabled (enabled by default).
+    - Added concurrency configuration options for the `vlm-vllm-async-engine`, `vlm-lmdeploy-engine`, and `vlm-http-client` backends. Users can use the environment variable `MINERU_API_MAX_CONCURRENT_REQUESTS` to set the maximum number of concurrent API requests (unlimited by default).
+
 - 2025/11/26 2.6.5 Release
   - Added support for a new backend vlm-lmdeploy-engine. Its usage is similar to vlm-vllm-(async)engine, but it uses lmdeploy as the inference engine and additionally supports native inference acceleration on Windows platforms compared to vllm.
 
@@ -797,6 +804,8 @@ Currently, some models in this project are trained based on YOLO. However, since
 - [pdfminer.six](https://github.com/pdfminer/pdfminer.six)
 - [pypdf](https://github.com/py-pdf/pypdf)
 - [magika](https://github.com/google/magika)
+- [vLLM](https://github.com/vllm-project/vllm)
+- [LMDeploy](https://github.com/InternLM/lmdeploy)
 
 # Citation
 
 
@@ -45,6 +45,15 @@
 
 # 更新记录
 
+- 2025/12/02 2.6.6 发布
+  - `Ascend`适配优化
+    - 优化命令行工具初始化流程，使Ascend适配方案中`vlm-vllm-engine`后端在命令行工具中可用。
+    - 为Atlas 300I Duo(310p)设备更新适配文档。
+  - `mineru-api`工具优化
+    - 为`mineru-api`接口参数增加描述性文本，优化接口文档可读性。
+    - 可通过环境变量`MINERU_API_ENABLE_FASTAPI_DOCS`控制是否启用自动生成的接口文档页面，默认为启用。
+    - 为`vlm-vllm-async-engine`、`vlm-lmdeploy-engine`、`vlm-http-client`后端增加并发数配置选项，用户可通过环境变量`MINERU_API_MAX_CONCURRENT_REQUESTS`控制api接口的最大并发请求数，默认为不限制数量。
+
 - 2025/11/26 2.6.5 发布
   - 增加新后端`vlm-lmdeploy-engine`支持，使用方式与`vlm-vllm-(async)engine`类似，但使用`lmdeploy`作为推理引擎，与`vllm`相比额外支持Windows平台原生推理加速。
   - 新增国产算力平台`昇腾/npu`、`平头哥/ppu`、`沐曦/maca`的适配支持，用户可在对应平台上使用`pipeline`与`vlm`模型，并使用`vllm`/`lmdeploy`引擎加速vlm模型推理，具体使用方式请参考[其他加速卡适配](https://opendatalab.github.io/MinerU/zh/usage/)。
@@ -791,6 +800,8 @@ mineru -p <input_path> -o <output_path>
 - [pdfminer.six](https://github.com/pdfminer/pdfminer.six)
 - [pypdf](https://github.com/py-pdf/pypdf)
 - [magika](https://github.com/google/magika)
+- [vLLM](https://github.com/vllm-project/vllm)
+- [LMDeploy](https://github.com/InternLM/lmdeploy)
 
 # Citation
 
 
@@ -1,6 +1,6 @@
 # 基础镜像配置 vLLM 或 LMDeploy ，请根据实际需要选择其中一个，要求 ARM(AArch64) CPU + Ascend NPU。
 # Base image containing the vLLM inference environment, requiring ARM(AArch64) CPU + Ascend NPU.
-FROM quay.io/ascend/vllm-ascend:v0.11.0rc2
+FROM quay.m.daocloud.io/ascend/vllm-ascend:v0.11.0rc2
 # Base image containing the LMDeploy inference environment, requiring ARM(AArch64) CPU + Ascend NPU.
 # FROM crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:mineru-a2
 
 
@@ -14,14 +14,36 @@ docker: 20.10.12
 >Ascend加速卡支持使用`vllm`或`lmdeploy`进行VLM模型推理加速。请根据实际需求选择安装和使用其中之一:
 
 ### 2.1 使用 Dockerfile 构建镜像 （vllm）
+> [!TIP]  
+> ascend-vllm支持设备如下:
+>
+> - Atlas A2 training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
+> - Atlas 800I A2 inference series (Atlas 800I A2)
+> - Atlas A3 training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas 9000 A3 SuperPoD)
+> - Atlas 800I A3 inference series (Atlas 800I A3)
+> - [Experimental] Atlas 300I inference series (Atlas 300I Duo)
+>
+> Dockerfile文件第三行为ascend-vllm基础镜像信息,默认tag为A2适配的版本,例如 `v0.11.0rc2`
+>
+> - 如需使用A3适配的版本,请将第三行的tag修改为 `v0.11.0rc2-a3`,然后再执行build操作。
+> - 如需使用Atlas 300I Duo适配的版本,请将第三行的tag修改为 `v0.10.0rc1-310p`,然后再执行build操作。
+
 
 ```bash
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/npu.Dockerfile
 docker build --network=host -t mineru:npu-vllm-latest -f npu.Dockerfile .
-``` 
+```
 
 ### 2.2 使用 Dockerfile 构建镜像 （lmdeploy）
 
+> [!TIP]  
+> ascend-lmdeploy支持设备如下:
+> 
+> - Atlas A2 training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
+> - Atlas 800I A2 inference series (Atlas 800I A2)
+> 
+> 如果您的设备为Atlas A3系列或Atlas 300I Duo系列，请使用vllm版本的镜像。
+
 ```bash
 wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/npu.Dockerfile
 # 将基础镜像从 vllm 切换为 lmdeploy
@@ -51,6 +73,7 @@ docker run -u root --name mineru_docker --privileged=true \
     -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
     -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
     -e MINERU_MODEL_SOURCE=local \
+    -e MINERU_VIRTUAL_VRAM_SIZE=16 \
     -e MINERU_LMDEPLOY_DEVICE=ascend \
     -it mineru:npu-vllm-latest \
     /bin/bash
@@ -62,6 +85,9 @@ docker run -u root --name mineru_docker --privileged=true \
 执行该命令后，您将进入到Docker容器的交互式终端，您可以直接在容器内运行MinerU相关命令来使用MinerU的功能。
 您也可以直接通过替换`/bin/bash`为服务启动命令来启动MinerU服务，详细说明请参考[通过命令启动服务](https://opendatalab.github.io/MinerU/zh/usage/quick_usage/#apiwebuihttp-clientserver)。
 
+>[!NOTE]
+> 由于310p加速卡不支持bf16精度，因此在使用该加速卡时，执行任意与`vllm`相关命令需追加`--enforce-eager --dtype float16`参数。
+
 ## 4. 注意事项
 
 不同环境下，MinerU对Ascend加速卡的支持情况如下表所示：
@@ -91,7 +117,7 @@ docker run -u root --name mineru_docker --privileged=true \
     </tr>
     <tr>
       <td>vlm-&lt;engine_name&gt;-engine</td>
-      <td>🔴</td>
+      <td>🟢</td>
       <td>🟢</td>
     </tr>
     <tr>
@@ -160,8 +186,9 @@ docker run -u root --name mineru_docker --privileged=true \
 🔴: 不支持，无法运行，或精度存在较大差异  
 
 >[!NOTE]
->由于npu卡的特殊性，单次服务启动后，可能会在运行过程中切换推理后端（backend）类型（pipeline/vlm）时出现异常，请尽量根据实际需求选择合适的推理后端进行使用。  
->如在服务中切换推理后端类型遇到报错或异常，请重新启动服务即可。
+>在使用vllm镜像启动mineru-api服务时，如先使用了pipeline后端解析，再切换到vlm-vllm-async-engine后端，会出现vllm引擎初始化失败的问题。  
+>如需在一个mineru-api服务中同时使用pipeline和vlm-vllm-async-engine两种后端，请先使用vlm-vllm-async-engine后端解析一次，之后即可自由切换。  
+>如在服务中切换推理后端类型时遇到报错或异常，请重新启动服务即可。
 
 >[!TIP]
 >NPU加速卡指定可用加速卡的方式与NVIDIA GPU类似，请参考[ASCEND_RT_VISIBLE_DEVICES](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/850alpha001/maintenref/envvar/envref_07_0028.html)
@@ -15,15 +15,16 @@ docker: 28.1.1
 
 ### 2.1 使用metax官方镜像作为基础镜像构建vllm环境镜像
 
-- 1. 从metax官方仓库拉取基础镜像
-    - 1.1 镜像获取地址：https://developer.metax-tech.com/softnova/docker  
+1. 从metax官方仓库拉取基础镜像
+    - 1.1 镜像获取地址：[https://developer.metax-tech.com/softnova/docker](https://developer.metax-tech.com/softnova/docker)  
     - 1.2 在镜像网站选择`AI`分类，软件包类型选择`vllm`，操作系统选择`ubuntu` 
     - 1.3 找到`vllm:maca.ai3.1.0.7-torch2.6-py310-ubuntu22.04-amd64`镜像，复制拉取命令并在本地终端执行
-- 2. 使用 Dockerfile 构建镜像 （vllm）
+2. 使用 Dockerfile 构建镜像 （vllm）
     ```bash
     wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docker/china/maca.Dockerfile
     docker build --network=host -t mineru:maca-vllm-latest -f maca.Dockerfile .
     ```
+
 
 ### 2.2 使用 Dockerfile 构建镜像 （lmdeploy）
 
 
@@ -159,7 +159,6 @@ def batch_image_analyze(
 
     model_manager = ModelSingleton()
 
-    batch_ratio = 1
     device = get_device()
 
     if str(device).startswith('npu'):
@@ -173,25 +172,21 @@ def batch_image_analyze(
                 "Please ensure that the torch_npu package is installed correctly."
             ) from e
 
-    if str(device).startswith('npu') or str(device).startswith('cuda'):
-        vram = get_vram(device)
-        if vram is not None:
-            gpu_memory = int(os.getenv('MINERU_VIRTUAL_VRAM_SIZE', round(vram)))
-            if gpu_memory >= 16:
-                batch_ratio = 16
-            elif gpu_memory >= 12:
-                batch_ratio = 8
-            elif gpu_memory >= 8:
-                batch_ratio = 4
-            elif gpu_memory >= 6:
-                batch_ratio = 2
-            else:
-                batch_ratio = 1
-            logger.info(f'gpu_memory: {gpu_memory} GB, batch_ratio: {batch_ratio}')
-        else:
-            # Default batch_ratio when VRAM can't be determined
-            batch_ratio = 1
-            logger.info(f'Could not determine GPU memory, using default batch_ratio: {batch_ratio}')
+    gpu_memory = get_vram(device)
+    if gpu_memory >= 16:
+        batch_ratio = 16
+    elif gpu_memory >= 12:
+        batch_ratio = 8
+    elif gpu_memory >= 8:
+        batch_ratio = 4
+    elif gpu_memory >= 6:
+        batch_ratio = 2
+    else:
+        batch_ratio = 1
+    logger.info(
+            f'GPU Memory: {gpu_memory} GB, Batch Ratio: {batch_ratio}. '
+            f'You can set MINERU_VIRTUAL_VRAM_SIZE environment variable to adjust GPU memory allocation.'
+    )
 
     # 检测torch的版本号
     import torch
 
@@ -81,20 +81,16 @@ def set_default_gpu_memory_utilization() -> float:
 def set_default_batch_size() -> int:
     try:
         device = get_device()
-        vram = get_vram(device)
-        if vram is not None:
-            gpu_memory = int(os.getenv('MINERU_VIRTUAL_VRAM_SIZE', round(vram)))
-            if gpu_memory >= 16:
-                batch_size = 8
-            elif gpu_memory >= 8:
-                batch_size = 4
-            else:
-                batch_size = 1
-            logger.info(f'gpu_memory: {gpu_memory} GB, batch_size: {batch_size}')
+        gpu_memory = get_vram(device)
+
+        if gpu_memory >= 16:
+            batch_size = 8
+        elif gpu_memory >= 8:
+            batch_size = 4
         else:
-            # Default batch_ratio when VRAM can't be determined
             batch_size = 1
-            logger.info(f'Could not determine GPU memory, using default batch_ratio: {batch_size}')
+        logger.info(f'gpu_memory: {gpu_memory} GB, batch_size: {batch_size}')
+
     except Exception as e:
         logger.warning(f'Error determining VRAM: {e}, using default batch_ratio: 1')
         batch_size = 1
 
@@ -113,15 +113,15 @@
     '--formula',
     'formula_enable',
     type=bool,
-    help='Enable formula parsing. Default is True. Adapted only for the case where the backend is set to "pipeline".',
+    help='Enable formula parsing. Default is True. ',
     default=True,
 )
 @click.option(
     '-t',
     '--table',
     'table_enable',
     type=bool,
-    help='Enable table parsing. Default is True. Adapted only for the case where the backend is set to "pipeline".',
+    help='Enable table parsing. Default is True. ',
     default=True,
 )
 @click.option(
@@ -172,9 +172,8 @@ def get_device_mode() -> str:
         def get_virtual_vram_size() -> int:
             if virtual_vram is not None:
                 return virtual_vram
-            if get_device_mode().startswith("cuda") or get_device_mode().startswith("npu"):
-                return round(get_vram(get_device_mode()))
-            return 1
+            else:
+                return get_vram(get_device_mode())
         if os.getenv('MINERU_VIRTUAL_VRAM_SIZE', None) is None:
             os.environ['MINERU_VIRTUAL_VRAM_SIZE']= str(get_virtual_vram_size())