Skip to content

在图片描述中使用api方法(PictureDescriptionApiOptions),拿不到返回结果 #2835

@Dithob

Description

@Dithob

Question

在图片描述中使用api方法,PictureDescriptionApiOptions,拿不到返回结果,模型api的token是消耗了的

调用脚本:

from pathlib import Path

from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption, ImageFormatOption
from docling.datamodel.pipeline_options import PictureDescriptionVlmOptions, granite_picture_description, PictureDescriptionApiOptions
from docling_core.types.doc.base import ImageRefMode
from docling_core.types.doc.document import PictureDescriptionData
from IPython import display

DOC_SOURCE = 图片
pipeline_options = PdfPipelineOptions(artifacts_path="/a/domains/docling/docling_models")

pipeline_options.do_picture_description = True

""" 本地加载vlm模型 """
# temp_picture_description = PictureDescriptionVlmOptions(
#     repo_id="Qwen/Qwen3-VL-2B-Instruct",
#     prompt="详细描述一下这张图片",
#     generation_config = dict(max_new_tokens=500, do_sample=False)
# )
#
# pipeline_options.picture_description_options = (
#     temp_picture_description  # <-- the model choice
# )

""" 使用vlm模型api """
pipeline_options.enable_remote_services=True  # 运行远程服务
pipeline_options.picture_description_options = PictureDescriptionApiOptions(
    url="https://ark.cn-beijing.volces.com/api/v3/chat/completions",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer xxx"
    },
    params=dict(
        model="doubao-seed-1-6-flash-250828",
        seed=42,
        max_completion_tokens=500,
    ),
    prompt="详细描述一下这张图片.",
    timeout=60,
)

pipeline_options.images_scale = 2.0
pipeline_options.generate_picture_images = True

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_options=pipeline_options,
        ),
        InputFormat.IMAGE: ImageFormatOption(
            pipeline_options=pipeline_options
        )
    }
)
extract_result = converter.convert(DOC_SOURCE)
markdown_output = extract_result.document.export_to_markdown()
print("--- markdown result ---")
print(markdown_output)

annotation = extract_result.document.pictures[0].annotations
print("--- annotation result ---")
print(annotation)

output_dir = Path("save_files")
output_dir.mkdir(parents=True, exist_ok=True)
doc_filename = extract_result.input.file.stem
html_filename = output_dir / f"{doc_filename}-with-images.md"
extract_result.document.save_as_markdown(html_filename, image_mode=ImageRefMode.REFERENCED)

输出结果:

--- markdown result ---
员工发起用户来电申信取消,员工审核为盛假,录入异常系统但

不扣工程师推荐分并也不向工程师发送虚假取消消息

<!-- image -->
/a/domains/docling/docling_master/picture_description.py:62: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
  annotation = extract_result.document.pictures[0].annotations
--- annotation result ---
[DescriptionAnnotation(kind='description', text='', provenance='not-implemented')]

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions