What are the input and output tensors for detection and recognition from a .onnx file? #14659

thebest132 · 2025-02-11T03:48:03Z

thebest132
Feb 11, 2025

What do I forward and what is the result and how do I interpret it?

GreatV · 2025-02-11T03:48:21Z

GreatV
Feb 11, 2025
Maintainer

The input and output tensors for detection and recognition from a .onnx file depend on whether you are using the detection model or the recognition model.

1. Input Tensor:

For both detection and recognition models, the input tensor is an image represented as a 4D tensor with the format:

(batch_size, num_channels, height, width)

batch_size: The number of images processed at once. This is typically 1 for single-image inference.
num_channels: The number of channels in the image, usually 3 for RGB images.
height, width: The dimensions of the input image.

For example, an image input tensor might have a shape like:

(1, 3, H, W)  # Where H and W are dynamically defined image dimensions.

2. Output Tensors:

Detection Model Output:

The detection model outputs bounding boxes around detected text regions. The output typically consists of:

dt_boxes: A list of detected bounding boxes in the format:
```
[[x1, y1, x2, y2, x3, y3, x4, y4], ...]
```
Each bounding box contains the coordinates of the four corners of the detected text region.

Recognition Model Output:

The recognition model takes each cropped text image from the detection step and produces the recognized text along with confidence scores.

rec_res: A list of recognized text strings with their confidence scores:

[("text_1", confidence_1), ("text_2", confidence_2), ...]

For example:

[("Hello", 0.99), ("World", 0.98)]

3. How to Forward the Input and Interpret the Output

Load the ONNX model using ONNXRuntime:

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name

Preprocess the input image:
- Convert the image to a NumPy array.
- Normalize pixel values (if required).
- Resize to the expected input size.
- Convert to float32 and reshape to (1, 3, H, W).
```
image_input = np.random.rand(1, 3, H, W).astype(np.float32)
```

Run inference:

outputs = session.run(None, {input_name: image_input})

Interpret the output:
- For detection, extract bounding box coordinates.
- For recognition, map output probabilities to text characters using a character dictionary.

By following this workflow, you can use the .onnx models for text detection and recognition in OCR tasks.

Response generated by feifei-bot | chatgpt-4o-latest

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What are the input and output tensors for detection and recognition from a .onnx file? #14659

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What are the input and output tensors for detection and recognition from a .onnx file? #14659

Uh oh!

thebest132 Feb 11, 2025

Replies: 1 comment

Uh oh!

GreatV Feb 11, 2025 Maintainer

1. Input Tensor:

2. Output Tensors:

Detection Model Output:

Recognition Model Output:

3. How to Forward the Input and Interpret the Output

thebest132
Feb 11, 2025

GreatV
Feb 11, 2025
Maintainer