ValueError (InvalidArgument) Broadcast dimension mismatch. #14385

cywc23 · 2024-12-14T14:39:41Z

cywc23
Dec 14, 2024

🔎 Search before asking

I have searched the PaddleOCR Docs and found no similar bug report.
I have searched the PaddleOCR Issues and found no similar bug report.
I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

十分抱歉，因为我的原因，之前的issue因为太久没回复被关闭了。我在之前issue里也进行了回复，请问可以帮忙看看如何解决吗

ValueError (InvalidArgument) Broadcast dimension mismatch. #13466

🏃‍♂️ Environment (运行环境)

OS: windows 11
Paddle: paddlepaddle-gpu 2.6.1
PaddleOCR: paddleocr 2.8.1
Python： 3.9.7
CUDA 11.8

显卡型号为
NVIDIA GeForce RTX 3050 Laptop GPU

驱动程序版本: 31.0.15.4630
驱动程序日期: 2023/11/30
DirectX 版本: 12 (FL 12.1)
物理位置： PCI 总线 1、设备 0、功能 0

利用率 31%
专用 GPU 内存 1.2/4.0 GB
共享 GPU 内存 0.2/7.9 GB
GPU 内存 1.4/11.9 GB

模型下载地址：
https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar
https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar
https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar

因为c盘用户名带有中文，我修改过一个模型默认路径的代码以适配中文（但是不记得改的哪个文件了），应该与bug无关。
如需要我可以重装环境。

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

from paddleocr import PaddleOCR
import cv2

img = cv2.imread(r'C:\Users\Ccu酱\Desktop\c23\临时\1.jpg')

reader = PaddleOCR(use_angle_cls=True, lang='ch')

result = reader.ocr(img)
print(result)

######################################

[2024/12/14 22:24:47] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=True, use_xpu=False, use_npu=False, use_mlu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='C:\Users\Ccu酱/.paddleocr/whl\det\ch\ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='C:\Users\Ccu酱/.paddleocr/whl\rec\ch\ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='C:\Users\Ccu酱\AppData\Local\Programs\Python\Python39\lib\site-packages\paddleocr\ppocr\utils\ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='C:\Users\Ccu酱/.paddleocr/whl\cls\ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, return_word_box=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='ch', det=True, rec=True, type='ocr', savefile=False, ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
Traceback (most recent call last):
File "C:\Users\Ccu酱\AppData\Local\Programs\Python\Python39\lib\idlelib\run.py", line 559, in runcode
exec(code, self.locals)
File "C:/Users/Ccu酱/Desktop/c23/py_home/ocrtest/1.py", line 12, in
result = reader.ocr(img)
File "C:\Users\Ccu酱\AppData\Local\Programs\Python\Python39\lib\site-packages\paddleocr\paddleocr.py", line 698, in ocr
assert isinstance(img, (np.ndarray, list, str, bytes))
AssertionError

######################################

测试图片：

GreatV · 2024-12-14T15:40:38Z

GreatV
Dec 14, 2024
Maintainer

无法复现

0 replies

SWHL · 2024-12-16T00:23:07Z

SWHL
Dec 16, 2024
Maintainer

建议改成最新CPU版paddle试试
建议换个电脑试试

问题解决的前提是我们可以复现。猜测多半是环境问题，需要逐一做实验排查。

0 replies

GreatV · 2024-12-16T00:28:23Z

GreatV
Dec 16, 2024
Maintainer

根据提供的错误信息和运行环境，问题的核心在于以下两个方面：

ValueError (InvalidArgument) Broadcast dimension mismatch：这是一个 PaddleOCR 使用中的广播维度不匹配问题，可能涉及输入数据格式不正确或模型配置不匹配。
AssertionError：assert isinstance(img, (np.ndarray, list, str, bytes)) 表示传入的图像参数 img 的类型不符合 PaddleOCR 所期望的类型。

问题分析

1. 传入的图像格式问题

在代码中，cv2.imread 被用来读取图片，返回的是一个 numpy.ndarray 类型的对象。然而，错误提示表明 ocr 函数没有识别到有效的输入数据，这可能与以下原因有关：

cv2.imread(r'C:\Users\Ccu酱\Desktop\c23\临时\1.jpg') 返回了 None，可能是图片路径有问题，或者图片文件损坏。
由于路径中包含中文字符，可能导致 OpenCV 的读取失败。

2. PaddleOCR 模型路径

代码中标明您修改了模型默认路径的代码来适配中文路径。这种修改可能会引入问题，尤其是如果路径没有完全正确设置，可能会引发模型加载失败。

3. 运行环境

您使用的是 Windows 环境，并且用户名中带有中文字符，这可能导致路径解析问题。此外，PaddleOCR 的某些依赖可能对中文路径不完全兼容。

解决方案

1. 确保图片路径正确并避免中文路径

为避免路径中包含中文字符导致的潜在问题，建议将图片文件移动到一个仅包含英文字符的路径中。例如：

img = cv2.imread(r'C:\temp\1.jpg')

确保 cv2.imread 返回的不是 None，可以通过以下方式检查：

if img is None:
    raise ValueError("Failed to read image. Please check the file path.")

2. 验证 PaddleOCR 模型加载路径

您提到修改过 PaddleOCR 的默认路径代码。建议重新安装 PaddleOCR，确保路径没有问题，并避免对代码进行修改。如果路径中确实需要中文支持，可以尝试以下方式：

在程序启动前设置环境变量：

import os
os.environ['PYTHONIOENCODING'] = 'utf-8'

使用 Unicode 编码的路径形式：

model_path = u"C:\\Users\\Ccu酱\\.paddleocr\\whl\\rec\\ch\\ch_PP-OCRv4_rec_infer"

3. 修复广播维度不匹配问题

广播维度不匹配可能是 PaddleOCR 的模型配置和输入数据不一致的问题。以下是可能的解决方法：

确保使用的 PaddleOCR 模型与 PaddleOCR 库的版本匹配。例如，您使用的是 paddleocr==2.8.1，需要确保下载的模型文件是与 PP-OCRv4 兼容的。
检查输入图像的大小是否符合模型要求，特别是 rec_image_shape 和 cls_image_shape 的配置（默认分别为 3, 48, 320 和 3, 48, 192）。可以在代码中手动调整图像大小：
```
img = cv2.resize(img, (320, 48))
```

4. 使用最新版本的 PaddleOCR

如果以上方法均无效，可以尝试更新 PaddleOCR 和 PaddlePaddle 到最新版本，解决可能的版本兼容问题：

pip install paddleocr --upgrade
pip install paddlepaddle-gpu --upgrade

示例代码（修改后）

以下是修改后的代码示例：

from paddleocr import PaddleOCR
import cv2

# 确保路径为英文字符
img_path = r'C:\temp\1.jpg'
img = cv2.imread(img_path)

# 检查图片是否成功加载
if img is None:
    raise ValueError("Failed to read image. Please check the file path.")

# 调整图像大小以匹配模型
img = cv2.resize(img, (320, 48))

# 初始化 PaddleOCR
reader = PaddleOCR(use_angle_cls=True, lang='ch')

# 调用 OCR 识别
result = reader.ocr(img)
print(result)

总结

确保图片路径和模型路径中不包含中文字符。
确保 cv2.imread 成功加载图片，避免返回 None。
检查输入图像大小是否符合模型要求。
使用与 PaddleOCR 版本匹配的模型文件。

如果问题仍未解决，可以尝试提供更详细的错误日志或检查是否存在其他路径或环境配置问题。

Response generated by feifei-bot | chatgpt-4o-latest

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError (InvalidArgument) Broadcast dimension mismatch. #14385

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ValueError (InvalidArgument) Broadcast dimension mismatch. #14385

Uh oh!

cywc23 Dec 14, 2024

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Replies: 3 comments

Uh oh!

GreatV Dec 14, 2024 Maintainer

Uh oh!

SWHL Dec 16, 2024 Maintainer

Uh oh!

GreatV Dec 16, 2024 Maintainer

问题分析

1. 传入的图像格式问题

2. PaddleOCR 模型路径

3. 运行环境

解决方案

1. 确保图片路径正确并避免中文路径

2. 验证 PaddleOCR 模型加载路径

3. 修复广播维度不匹配问题

4. 使用最新版本的 PaddleOCR

示例代码（修改后）

总结

cywc23
Dec 14, 2024

GreatV
Dec 14, 2024
Maintainer

SWHL
Dec 16, 2024
Maintainer

GreatV
Dec 16, 2024
Maintainer