Skip to content

Commit d1ed047

Browse files
committed
Merge branch 'main' into feat/intel-amd
2 parents 1111119 + fed730d commit d1ed047

File tree

100 files changed

+5170
-243
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+5170
-243
lines changed
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Docs Anchor Link Check
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- 'docs/**'
7+
- 'mkdocs.yml'
8+
- 'mkdocs-ci.yml'
9+
- 'overrides/**'
10+
11+
jobs:
12+
check-anchor-links:
13+
runs-on: ubuntu-latest
14+
steps:
15+
- uses: actions/checkout@v6
16+
with:
17+
fetch-depth: 1
18+
19+
- uses: actions/setup-python@v6
20+
with:
21+
python-version: '3.x'
22+
23+
- uses: actions/cache@v5
24+
with:
25+
path: ~/.cache/pip
26+
key: ${{ runner.os }}-pip-mkdocs-${{ hashFiles('mkdocs.yml') }}
27+
restore-keys: |
28+
${{ runner.os }}-pip-mkdocs-
29+
30+
- name: Install dependencies
31+
run: pip install mike mkdocs-material jieba mkdocs-git-revision-date-localized-plugin mkdocs-git-committers-plugin-2 mkdocs-static-i18n markdown-callouts
32+
33+
- name: Check for broken anchor links
34+
env:
35+
ENABLE_GIT_PLUGINS: 'false'
36+
run: mkdocs build -f mkdocs-ci.yml

.github/workflows/python-publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
- name: Build package
3434
run: python -m build
3535
- name: Publish package
36-
uses: pypa/gh-action-pypi-publish@ec4db0b4ddc65acdf4bff5fa45ac92d78b56bdf0 # v1.13.0
36+
uses: pypa/gh-action-pypi-publish@ed0c53931b1dc9bd32cbe73a98c7f6766f8a527e # v1.13.0
3737
with:
3838
user: __token__
3939
password: ${{ secrets.PYPI_API_TOKEN }}

configs/rec/multi_language/generate_multi_language_configs.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import os.path
1818
import logging
1919

20-
logging.basicConfig(level=logging.INFO)
20+
logger = logging.getLogger("ppocr")
2121

2222
support_list = {
2323
"it": "italian",
@@ -281,27 +281,27 @@ def loss_file(path):
281281

282282
with open(save_file_path, "w") as f:
283283
yaml.dump(dict(global_config), f, default_flow_style=False, sort_keys=False)
284-
logging.info("Project path is :{}".format(project_path))
285-
logging.info(
284+
logger.info("Project path is :{}".format(project_path))
285+
logger.info(
286286
"Train list path set to :{}".format(
287287
global_config["Train"]["dataset"]["label_file_list"][0]
288288
)
289289
)
290-
logging.info(
290+
logger.info(
291291
"Eval list path set to :{}".format(
292292
global_config["Eval"]["dataset"]["label_file_list"][0]
293293
)
294294
)
295-
logging.info(
295+
logger.info(
296296
"Dataset root path set to :{}".format(
297297
global_config["Eval"]["dataset"]["data_dir"]
298298
)
299299
)
300-
logging.info(
300+
logger.info(
301301
"Dict path set to :{}".format(
302302
global_config["Global"]["character_dict_path"]
303303
)
304304
)
305-
logging.info(
305+
logger.info(
306306
"Config file set to :configs/rec/multi_language/{}".format(save_file_path)
307307
)

deploy/cpp_infer/src/utils/args.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ DEFINE_string(text_det_limit_side_len, "64",
5454
"text detection model.");
5555
DEFINE_string(text_det_limit_type, "min",
5656
"This determines how the side length limit is applied to the "
57-
"input image before feeding it into the text deteciton model.");
57+
"input image before feeding it into the text detection model.");
5858
DEFINE_string(text_det_thresh, "0.3",
5959
"Detection pixel threshold for the text detection model. Pixels "
6060
"with scores greater than this threshold in the output "
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# PaddleOCR-VL-1.5 HPS Configuration
2+
# Copy this file to .env and modify as needed
3+
4+
# =============================================================================
5+
# Gateway Configuration
6+
# =============================================================================
7+
8+
# Maximum concurrent inference requests (layout-parsing / infer)
9+
# Increase for higher throughput, decrease if device is overloaded
10+
HPS_MAX_CONCURRENT_INFERENCE_REQUESTS=16
11+
12+
# Maximum concurrent non-inference requests (restructure-pages)
13+
# Can be set higher since non-inference operations are lighter
14+
HPS_MAX_CONCURRENT_NON_INFERENCE_REQUESTS=64
15+
16+
# Inference timeout in seconds
17+
# Increase for complex documents, decrease for faster failure detection
18+
HPS_INFERENCE_TIMEOUT=600
19+
20+
# Health check timeout in seconds
21+
HPS_HEALTH_CHECK_TIMEOUT=5
22+
23+
# VLM server URL for health checks
24+
# HPS_VLM_URL=http://paddleocr-vlm-server:8080
25+
26+
# Log level: DEBUG, INFO, WARNING, ERROR
27+
HPS_LOG_LEVEL=INFO
28+
29+
# Filter health check endpoints from access logs (true/false)
30+
# Set to false to see all access logs including health checks
31+
HPS_FILTER_HEALTH_ACCESS_LOG=true
32+
33+
# Number of Uvicorn worker processes
34+
# Recommended: 2-4 workers per CPU core
35+
UVICORN_WORKERS=4
36+
37+
# =============================================================================
38+
# Device Configuration
39+
# =============================================================================
40+
41+
# Inference device ID to use (0, 1, 2, etc.)
42+
DEVICE_ID=0
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# PaddleOCR-VL-1.5 高性能服务化部署
2+
3+
[English](README_en.md)
4+
5+
本目录提供一套支持并发请求处理的 PaddleOCR-VL-1.5 高性能服务化部署方案。
6+
7+
> 本方案目前暂时只支持 NVIDIA GPU,对其他推理设备的支持仍在完善中。
8+
9+
## 架构
10+
11+
```
12+
客户端 → FastAPI 网关 → Triton 服务器 → vLLM 服务器
13+
```
14+
15+
| 组件 | 说明 |
16+
|----------------|----------------------------------------|
17+
| FastAPI 网关 | 统一访问入口、简化客户端调用、并发控制 |
18+
| Triton 服务器 | 版面检测模型(PP-DocLayoutV3)及产线串联逻辑,负责模型管理、动态批处理、推理调度 |
19+
| vLLM 服务器 | VLM(PaddleOCR-VL-1.5),连续批处理推理 |
20+
21+
**Triton 模型:**
22+
23+
| 模型 | 设备 | 说明 |
24+
|------|------|------|
25+
| `layout-parsing` | 推理设备(如 GPU) | 版面解析推理 |
26+
| `restructure-pages` | CPU | 多页结果后处理(跨页表格合并、标题层级重分配) |
27+
28+
## 环境要求
29+
30+
- x64 CPU
31+
- NVIDIA GPU,Compute Capability >= 8.0 且 < 12.0
32+
- NVIDIA 驱动支持 CUDA 12.6
33+
- Docker >= 19.03
34+
- Docker Compose >= 2.0
35+
36+
## 快速开始
37+
38+
1. 拉取 PaddleOCR 源码并切换到当前目录:
39+
40+
```bash
41+
git clone https://github.com/PaddlePaddle/PaddleOCR.git
42+
cd PaddleOCR/deploy/paddleocr_vl_docker/hps
43+
```
44+
45+
2. 准备必要文件:
46+
47+
```bash
48+
bash prepare.sh
49+
```
50+
51+
3. 启动服务:
52+
53+
```bash
54+
docker compose up
55+
```
56+
57+
上述命令将依次启动 3 个容器:
58+
59+
| 服务 | 说明 | 端口 |
60+
|------|------|------|
61+
| `paddleocr-vl-api` | FastAPI 网关(对外入口) | 8080 |
62+
| `paddleocr-vl-tritonserver` | Triton 推理服务器 | 8000(内部) |
63+
| `paddleocr-vlm-server` | 基于 vLLM 的 VLM 推理服务 | 8080(内部) |
64+
65+
> 首次启动会自动下载并构建镜像,耗时较长;从第二次启动起将直接使用本地镜像,启动速度更快。
66+
67+
## 配置说明
68+
69+
### 环境变量
70+
71+
复制 `.env.example``.env` 并根据需要修改。
72+
73+
```bash
74+
cp .env.example .env
75+
```
76+
77+
除了通过 `.env` 文件设置,也可以直接设置环境变量,如:
78+
79+
```bash
80+
export HPS_MAX_CONCURRENT_INFERENCE_REQUESTS=8
81+
```
82+
83+
| 变量 | 默认值 | 说明 |
84+
|------|--------|------|
85+
| `HPS_MAX_CONCURRENT_INFERENCE_REQUESTS` | 16 | 推理操作(版面解析)最大并发请求数 |
86+
| `HPS_MAX_CONCURRENT_NON_INFERENCE_REQUESTS` | 64 | 非推理操作(多页重组)最大并发请求数 |
87+
| `HPS_INFERENCE_TIMEOUT` | 600 | 请求超时时间(秒) |
88+
| `HPS_HEALTH_CHECK_TIMEOUT` | 5 | 健康检查超时时间(秒) |
89+
| `HPS_VLM_URL` | http://paddleocr-vlm-server:8080 | VLM 服务器地址(用于健康检查) |
90+
| `HPS_LOG_LEVEL` | INFO | 日志级别(DEBUG, INFO, WARNING, ERROR) |
91+
| `HPS_FILTER_HEALTH_ACCESS_LOG` | true | 是否过滤健康检查的访问日志 |
92+
| `UVICORN_WORKERS` | 4 | 网关 Worker 进程数 |
93+
| `DEVICE_ID` | 0 | 使用的推理设备 ID |
94+
95+
### 产线配置调整
96+
97+
如需调整产线相关配置(如模型路径、批处理大小、部署设备等),请参考 [PaddleOCR-VL 使用教程](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/pipeline_usage/PaddleOCR-VL.md) 中的产线配置调整说明章节。
98+
99+
## API 使用
100+
101+
### 文档解析
102+
103+
请参考 [PaddleOCR-VL 使用教程](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/pipeline_usage/PaddleOCR-VL.md) 中的客户端调用相关章节。
104+
105+
### 健康检查
106+
107+
```bash
108+
# 存活检查
109+
curl http://localhost:8080/health
110+
111+
# 就绪检查(验证 Triton 和 VLM 服务是否已准备好处理请求)
112+
curl http://localhost:8080/health/ready
113+
```
114+
115+
## 性能调优
116+
117+
### 并发设置
118+
119+
网关对推理操作和非推理操作各自独立地进行并发控制:
120+
121+
- **`HPS_MAX_CONCURRENT_INFERENCE_REQUESTS`**(默认 16):控制 `layout-parsing`(版面解析)等推理操作的并发数
122+
- 过低(4):推理设备利用率不足,请求不必要地排队
123+
- 过高(64):可能导致 Triton 过载,出现 OOM 或超时
124+
- 默认值 16 允许在当前批次处理时有足够请求排队形成下一批次
125+
- 如推理设备资源有限,建议适当降低此值
126+
- **`HPS_MAX_CONCURRENT_NON_INFERENCE_REQUESTS`**(默认 64):控制 `restructure-pages`(多页重组)等非推理操作的并发数
127+
- 非推理操作不占用推理设备资源,可以设置更高的并发数
128+
- 可根据 CPU 核数和内存情况调整
129+
130+
**高吞吐配置示例:**
131+
132+
```bash
133+
# .env
134+
HPS_MAX_CONCURRENT_INFERENCE_REQUESTS=32
135+
HPS_MAX_CONCURRENT_NON_INFERENCE_REQUESTS=128
136+
UVICORN_WORKERS=8
137+
```
138+
139+
**低延迟配置示例:**
140+
141+
```bash
142+
# .env
143+
HPS_MAX_CONCURRENT_INFERENCE_REQUESTS=8
144+
HPS_MAX_CONCURRENT_NON_INFERENCE_REQUESTS=32
145+
HPS_INFERENCE_TIMEOUT=300
146+
UVICORN_WORKERS=2
147+
```
148+
149+
### Worker 进程数
150+
151+
每个 Uvicorn Worker 是独立的进程,有自己的事件循环:
152+
153+
- **1 个 Worker**:简单,但受限于单进程
154+
- **4 个 Worker**:适合大多数场景
155+
- **8+ 个 Worker**:适用于高并发、大量小请求的场景
156+
157+
### Triton 动态批处理
158+
159+
Triton 自动将请求批处理以提高推理设备利用率。最大批处理大小通过模型配置文件中的 `max_batch_size` 参数控制(默认:8),配置文件位于模型仓库目录下的 `config.pbtxt`(如 `model_repo/layout-parsing/config.pbtxt`)。
160+
161+
### Triton 实例数
162+
163+
每个 Triton 模型的并行推理实例数通过 `config.pbtxt` 中的 `instance_group` 配置(默认:1)。增加实例数可以提高并行处理能力,但会占用更多设备资源。
164+
165+
```
166+
# model_repo/layout-parsing/config.pbtxt
167+
instance_group [
168+
{
169+
count: 1 # 实例数,增大可提高并行度
170+
kind: KIND_GPU
171+
gpus: [ 0 ]
172+
}
173+
]
174+
```
175+
176+
实例数与动态批处理之间存在权衡:
177+
178+
- **单实例(`count: 1`**:动态批处理会将多个请求合并为一个批次并行执行,但同批次的请求需等待最慢的那个完成后才能一起返回,可能导致部分请求的时延升高。同时,单实例同一时刻只能处理一个批次,当前批次未完成时后续请求只能排队等待。适合显存有限或请求耗时较均匀的场景
179+
- **多实例(`count: 2+`**:多个实例可以同时各自处理不同的批次,能够同时处理更多请求,减少排队等待时间,单个请求的时延也会有所改善。但需注意,同一实例内的批次仍然遵循动态批处理的行为(批内请求一起开始、一起结束)。每增加一个实例会额外占用一份版面检测模型的显存,同时也会增加对 VLM 推理服务的负载以及内存和 CPU 的使用,需根据推理设备的资源情况酌情设置
180+
181+
非推理模型(如 `restructure-pages`)运行在 CPU 上,可根据 CPU 核数适当增加实例数。
182+
183+
## 故障排查与解决
184+
185+
### 服务无法启动
186+
187+
查看各服务的日志以定位问题:
188+
189+
```bash
190+
docker compose logs paddleocr-vl-api
191+
docker compose logs paddleocr-vl-tritonserver
192+
docker compose logs paddleocr-vlm-server
193+
```
194+
195+
常见原因包括端口被占用、推理设备不可用或镜像拉取失败。
196+
197+
### 超时错误
198+
199+
- 增加 `HPS_INFERENCE_TIMEOUT`(针对复杂文档)
200+
- 如果推理设备过载,减少 `HPS_MAX_CONCURRENT_INFERENCE_REQUESTS`
201+
202+
### 内存/显存不足
203+
204+
- 减少 `HPS_MAX_CONCURRENT_INFERENCE_REQUESTS`
205+
- 确保每个推理设备只运行一个服务
206+
- 检查 compose.yaml 中的 `shm_size`(默认:4GB)

0 commit comments

Comments
 (0)