Skip to content

[Feature]: Add support for GLM-OCR model #12726

@Huaweidev

Description

@Huaweidev

Issue Checklist

  • I understand that issues are for reporting problems and requesting features, not for off-topic comments, and I will provide as much detail as possible to help resolve the issue.
  • I have checked the pinned issues and searched through the existing open issues, closed issues, and discussions and did not find a similar suggestion.
  • I have provided a short and descriptive title so that developers can quickly understand the issue when browsing the issue list, rather than vague titles like "A suggestion" or "Stuck."
  • The latest version of Cherry Studio does not include the feature I am suggesting.

Platform

Windows

Version

v1.7.15

Is your feature request related to an existing issue?

N/A

Desired Solution

Currently, the GLM-OCR model manually added on the Zhipu Open Platform reports errors during both text and image conversations, even though the "Model Type: Visual" option has been enabled in the model settings.

响应首部: {
  "date": "Thu, 05 Feb 2026 01:42:27 GMT",
  "document-policy": "include-js-call-stacks-in-crash-reports",
  "strict-transport-security": "max-age=31536000; includeSubDomains",
  "vary": "Origin, Access-Control-Request-Method, Access-Control-Request-Headers, Origin, Access-Control-Request-Method, Access-Control-Request-Headers",
  "x-log-id": "20260205094226184d5f25e9264aa9"
}
响应内容: {
  "error": {
    "code": "1214",
    "message": "OCR仅支持PDF、JPG、PNG格式;文件大小限制:图片≤10MB、PDF≤50MB;PDF最大100页"

Alternative Solutions

N/A

Additional Information

You can get normal output using the following command.

curl --location --request POST 'https://open.bigmodel.cn/api/paas/v4/layout_parsing' \
--header 'Authorization: key' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "glm-ocr",
  "file": "https://cdn.bigmodel.cn/static/logo/introduction.png"
}'

Output content:

created              : 1770255532
data_info            : @{num_pages=1; pages=System.Object[]}
id                   : 20260205093850e5c25b10db3d44d6
layout_details       : {                                   }
layout_visualization : {}
md_results           : 模型体验
                       文本模型
...
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Blocked: v2The PR is blocked until v2 releasedModelsCategorizes an issue or PR as relevant to SIG LLMOCRCategorizes an issue or PR as relevant to OCR processingRAGCategorizes an issue or PR as relevant to SIG RAGfeatureCategorizes issue or PR as related to a feature enhancement

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions