add http way for calling the service (#4325) (#4330)

boomercat · Bobholamovic · web-flow · commit 56c2af4c4068 · 2025-07-07T16:18:33.000+08:00
* add http serve way

* fix copy error

* update http code

* fix en code

* update content

* update content again

* update content again and again

* update http content

* udpate serving content

* Polish doc

* update en version

* update en version

* update en version

* update en version

---------

Co-authored-by: Bobholamovic &lt;bob1998425@hotmail.com&gt;
diff --git a/docs/pipeline_deploy/serving.en.md b/docs/pipeline_deploy/serving.en.md
@@ -350,7 +350,13 @@ I1216 11:37:21.643494 35 http_server.cc:167] Started Metrics Service at 0.0.0.0:
 
 ### 2.4 Invoke the Service
 
-Currently, only the Python client is supported for calling the service. Supported Python versions are 3.8 to 3.12.
+Users can call the pipeline service through the Python client provided by the SDK or by manually constructing HTTP requests (with no restriction on client programming languages). 
+
+
+The services deployed using the high-stability serving solution offer the primary operations that match those of the basic serving solution. For each primary operation, the endpoint names and the request and response data fields are consistent with the basic serving solution. Please refer to the "Development Integration/Deployment" section in the tutorials for each pipeline. The tutorials for each pipeline can be found [here](../pipeline_usage/pipeline_develop_guide.en.md).
+
+
+#### 2.4.1 Use Python Client
 
 Navigate to the `client` directory of the high-stability serving SDK, and run the following command to install dependencies:
 
@@ -360,6 +366,68 @@ python -m pip install -r requirements.txt
 python -m pip install paddlex_hps_client-*.whl
 ```
 
+The Python client currently supports Python versions 3.8 to 3.12.
+
 The `client.py` script in the `client` directory contains examples of how to call the service and provides a command-line interface.
 
-The services deployed using the high-stability serving solution offer the primary operations that match those of the basic serving solution. For each primary operation, the endpoint names and the request and response data fields are consistent with the basic serving solution. Please refer to the "Development Integration/Deployment" section in the tutorials for each pipeline. The tutorials for each pipeline can be found [here](../pipeline_usage/pipeline_develop_guide.en.md).
+#### 2.4.2 Manually Construct HTTP Requests
+
+The following method demonstrates how to call the service using the HTTP interface in scenarios where the Python client is not applicable.
+
+First, you need to manually construct the HTTP request body. The request body must be in JSON format and contains the following fields:
+
+- `inputs`: Input tensor information. The input tensor name `name` is uniformly set to `input`, the shape is `[1, 1]`, and the data type `datatype` is `BYTES`. The  tensor data `data` contains a single JSON string, and the content of this JSON should follow the pipeline-specific format (consistent with the basic serving solution).
+- `outputs`: Output tensor information. The output tensor name `name` is uniformly set to `output`.
+
+Taking the general OCR pipeline as an example, the constructed request body is as follows:
+
+```JSON
+{
+  "inputs": [
+    {
+      "name": "input",
+      "shape": [1, 1],
+      "datatype": "BYTES",
+      "data": [
+        {
+          "file": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png",
+          "visualize": false
+        }
+      ]
+    }
+  ],
+  "outputs": [
+    {
+      "name": "output"
+    }
+  ]
+}
+```
+
+Send the constructed request body to the corresponding HTTP inference endpoint of the service. By default, the service listens on HTTP port `8000`, and the inference request URL follows the format `http://{hostname}:8000/v2/models/{endpoint name}/infer`.
+
+Using the general OCR pipeline as an example, the following is a `curl` command to send the request:
+
+```bash
+# Assuming `REQUEST_JSON` is the request body constructed in the previous step
+curl -s -X POST http://localhost:8000/v2/models/ocr/infer \
+    -H 'Content-Type: application/json' \
+    -d "${REQUEST_JSON}"
+```
+
+Finally, the response from the service needs to be parsed. The raw response body has the following structure:
+
+```json
+{
+  "outputs": [
+    {
+      "name": "output",
+      "data": [
+        "{\"errorCode\": 0, \"result\": {\"ocrResults\": [...]}}"
+      ]
+    }
+  ]
+}
+```
+
+`outputs[0].data[0]` is a JSON string. The internal fields follow the same format as the response body in the basic serving solution. For detailed parsing rules, please refer to the usage guide for each specific pipeline.
diff --git a/docs/pipeline_deploy/serving.md b/docs/pipeline_deploy/serving.md
@@ -350,7 +350,11 @@ I1216 11:37:21.643494 35 http_server.cc:167] Started Metrics Service at 0.0.0.0:
 
 ### 2.4 调用服务
 
-目前，仅支持使用 Python 客户端调用服务。支持的 Python 版本为 3.8 至 3.12。
+用户可以通过 SDK 中的 Python 客户端调用产线服务，或者手动构造 HTTP 请求（对客户端语言无限制）。
+
+使用高稳定性服务化部署方案部署的服务，提供与基础服务化部署方案相匹配的主要操作。对于每个主要操作，端点名称以及请求和响应的数据字段都与基础服务化部署方案保持一致。请参阅各产线使用教程中的 <b>“开发集成/部署”</b> 部分。在 [此处](../pipeline_usage/pipeline_develop_guide.md) 可以找到各产线的使用教程。
+
+#### 2.4.1 使用 Python 客户端
 
 切换到高稳定性服务化部署 SDK 的 `client` 目录，执行如下命令安装依赖：
 
@@ -360,6 +364,68 @@ python -m pip install -r requirements.txt
 python -m pip install paddlex_hps_client-*.whl
 ```
 
+Python 客户端目前支持的 Python 版本为 3.8 至 3.12。
+
 `client` 目录的 `client.py` 脚本包含服务的调用示例，并提供命令行接口。
 
-使用高稳定性服务化部署方案部署的服务，提供与基础服务化部署方案相匹配的主要操作。对于每个主要操作，端点名称以及请求和响应的数据字段都与基础服务化部署方案保持一致。请参阅各产线使用教程中的 <b>“开发集成/部署”</b> 部分。在 [此处](../pipeline_usage/pipeline_develop_guide.md) 可以找到各产线的使用教程。
+#### 2.4.2 手动构造 HTTP 请求
+
+以下方式手工构造 HTTP 请求体并调用，适用于 Python 客户端不适用的情形。
+
+首先，需要构造请求体。请求体为 JSON 格式，包含以下字段：
+
+- `inputs`：输入张量信息。输入张量名称 `name` 统一为 `input`，张量形状 `shape` 为 `[1, 1]`，数据类型 `datatype` 为 `BYTES`。张量数据 `data` 包含一个 JSON 字符串，JSON 的内容需对应不同产线字段（与基础服务化部署一致）。
+- `outputs`：输出张量信息。输出张量名称 `name` 统一为`output`。
+
+以通用 OCR 产线为例，构造的请求体内容示例如下：
+
+```JSON
+{
+  "inputs": [
+    {
+      "name": "input",
+      "shape": [1, 1],
+      "datatype": "BYTES",
+      "data": [
+        {
+          "file": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_001.png",
+          "visualize": false
+        }
+      ]
+    }
+  ],
+  "outputs": [
+    {
+      "name": "output"
+    }
+  ]
+}
+```
+
+将构造好的请求体通过发送到服务对应的 HTTP 推理端点。服务默认监听的 HTTP 端口为 `8000`，推理请求的 URL 格式为 `http://{主机名}:8000/v2/models/{端点名称}/infer`。
+
+以通用 OCR 产线为例，如下是通过 `curl` 向发送请求的例子：
+
+```bash
+# 假设 `REQUEST_JSON` 为上一步骤中构造的请求体
+curl -s -X POST http://localhost:8000/v2/models/ocr/infer \
+    -H 'Content-Type: application/json' \
+    -d "${REQUEST_JSON}"
+```
+
+最后，需要解析服务的响应。响应体的原始结构如下：
+
+```json
+{
+  "outputs": [
+    {
+      "name": "output",
+      "data": [
+        "{\"errorCode\": 0, \"result\": {\"ocrResults\": [...]}}"
+      ]
+    }
+  ]
+}
+```
+
+其中 `outputs[0].data[0]` 是一个 JSON 字符串，其中的字段与基础服务化部署方案中的响应体保持一致，具体解析规则可以查看各产线使用教程。