Skip to content

视觉推理直接传入文件路径出错 #37

@reeered

Description

@reeered

Windows环境下,执行下面的代码

from dashscope import MultiModalConversation

local_path1 = "test_video_frames/frame_0000.jpg"
local_path2 = "test_video_frames/frame_0001.jpg"
local_path3 = "test_video_frames/frame_0002.jpg"
local_path4 = "test_video_frames/frame_0003.jpg"

image_path1 = f"file://{local_path1}"
image_path2 = f"file://{local_path2}"
image_path3 = f"file://{local_path3}"
image_path4 = f"file://{local_path4}"

messages = [{"role": "system",
                "content": [{"text": "You are a helpful assistant."}]},
                {'role':'user',
                # 若模型属于Qwen2.5-VL系列且传入图像列表时,可设置fps参数,表示图像列表是由原视频每隔 1/fps 秒抽取的,其他模型设置则不生效
                'content': [{'video': [image_path1,image_path2,image_path3,image_path4],"fps":2},
                            {'text': '这段视频描绘的是什么景象?'}]}]

response = MultiModalConversation.call(
    api_key=api_key,
    model='qwen-vl-max-latest', 
    messages=messages)

print(response)

输出结果:

{"status_code": 400, "request_id": "948fecd8-3d82-9e8e-a68b-afb9da049b52", "code": "InvalidParameter.DataInspection", "message": "The media format is not supported or incorrect for the data inspection.", "output": null, "usage": null}

此外,如果模型选择qwen-vl-max,输出的response则为:

{"status_code": 400, "request_id": "100e23e9-6fe9-9d70-92b5-12de2fd50793", "code": "InvalidParameter", "message": "<400> InternalError.Algo.InvalidParameter: The provided URL does not appear to be valid. Ensure it is correctly formatted.", "output": null, "usage": null}

改为使用base64编码则无以上问题。但文档中提到”Base64编码会增加数据体积,以文件路径方式传输时,稳定性更高,建议优先使用该方式“,所以希望能够修复此问题。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions