Skip to content

Why return_word_box didn't works? #400

@mkrzywda

Description

@mkrzywda

问题描述 / Problem Description

I have a Python script with RapidOCR set to return_word_box: True, but it doesn't work as it still generates bounding boxes per line instead of per word. This is critical for me because RapidOCR is a key component of my project, and if it doesn't work correctly with word-level bounding boxes, I'll have to look for another solution.

运行环境 / Runtime Environment

WSL + OpenVINO

复现代码 / Reproduction Code

from rapidocr import RapidOCR
import time

# Initialize RapidOCR with parameters
config_path = "default_rapidocr.yaml"
engine = RapidOCR(config_path=config_path)

print(engine.config)
img_path = "test.png"

# Measure start time
start_time = time.time()

# Run OCR
result = engine(img_path, use_det=True, use_cls=False, use_rec=True)

# Measure end time and calculate duration
end_time = time.time()
duration_ms = (end_time - start_time) * 1000  # Convert to milliseconds

print("OCR Result:", result)
print(f"Processing time: {duration_ms:.2f} ms")

# Visualize result
result.vis('vis_result-openvino-rapid-ocr.jpg')

default_config.yaml

Global:
    lang_det: "en_mobile" # ch_server
    lang_rec: "en_mobile"
    text_score: 0.5

    use_det: true
    use_cls: false
    use_rec: true

    min_height: 30
    width_height_ratio: 8
    max_side_len: 2000
    min_side_len: 30

    return_word_box: true

    with_onnx: false
    with_openvino: true
    with_paddle: false
    with_torch: false

    font_path: null

EngineConfig:
    onnxruntime:
        intra_op_num_threads: -1
        inter_op_num_threads: -1
        use_cuda: true
        use_dml: false

    openvino:
        inference_num_threads: -1

    paddlepaddle:
        cpu_math_library_num_threads: -1
        use_cuda: false
        gpu_id: 0
        gpu_mem: 500

    torch:
        use_cuda: false
        gpu_id: 0

Det:
    model_path: null
    model_dir: null

    limit_side_len: 736
    limit_type: min
    std: [ 0.5, 0.5, 0.5 ]
    mean: [ 0.5, 0.5, 0.5 ]

    thresh: 0.3
    box_thresh: 0.5
    max_candidates: 1000
    unclip_ratio: 1.6
    use_dilation: true
    score_mode: fast

Cls:
    model_path: null
    model_dir: null

    cls_image_shape: [3, 48, 192]
    cls_batch_num: 6
    cls_thresh: 0.9
    label_list: ['0', '180']

Rec:
    model_dir: null
    dict_url: null
    rec_keys_path: null
    rec_img_shape: [3, 48, 320]
    rec_batch_num: 6

可能解决方案 / Possible solutions

Image

Image

  • Handling Bouding Boxes per word instead of whole lline.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions