Skip to content

Commit 397ba2d

Browse files
author
Zihao Mu
committed
add OpenCV sample for digit and text recongnition, and provide multiple OCR models.
1 parent 8cc2cbf commit 397ba2d

File tree

7 files changed

+256
-5
lines changed

7 files changed

+256
-5
lines changed
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# How to run custom OCR model {#tutorial_dnn_OCR}
2+
3+
@prev_tutorial{tutorial_dnn_custom_layers}
4+
5+
## Introduction
6+
7+
In this tutorial, we first introduce how to obtain the custom OCR model, then how to transform your own OCR models so that they can be run correctly by the opencv_dnn module. and finally we will provide some pre-trained models.
8+
9+
## Train your own OCR model
10+
11+
[This repository](https://github.com/zihaomu/deep-text-recognition-benchmark) is a good start point for training your own OCR model. In repository, the MJSynth+SynthText was set as training set by default. In addition, you can configure the model structure and data set you want.
12+
13+
## Transform OCR model to ONNX format and Use it in OpenCV DNN
14+
15+
After completing the model training, please use [transform_to_onnx.py](https://github.com/zihaomu/deep-text-recognition-benchmark/blob/master/transform_to_onnx.py) to convert the model into onnx format.
16+
17+
#### Execute in webcam
18+
The Python version example code can be found at [here](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.py).
19+
20+
Example:
21+
@code{.bash}
22+
$ text_detection -m=[path_to_text_detect_model] -ocr=[path_to_text_recognition_model]
23+
@endcode
24+
25+
## Pre-trained ONNX models are provided
26+
27+
Some pre-trained models can be found at https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing.
28+
29+
Their performance at different text recognition datasets is shown in the table below:
30+
31+
| Model name | IIIT5k(%) | SVT(%) | ICDAR03(%) | ICDAR13(%) | ICDAR15(%) | SVTP(%) | CUTE80(%) | average acc (%) | parameter( x10^6 ) |
32+
| -------------------- | --------- | ------ | ---------- | ---------- | ---------- | ------- | --------- | --------------- | ------------------ |
33+
| DenseNet-CTC | 72.267 | 67.39 | 82.81 | 80 | 48.38 | 49.45 | 42.50 | 63.26 | 0.24 |
34+
| DenseNet-BiLSTM-CTC | 73.76 | 72.33 | 86.15 | 83.15 | 50.67 | 57.984 | 49.826 | 67.69 | 3.63 |
35+
| VGG-CTC | 75.96 | 75.42 | 85.92 | 83.54 | 54.89 | 57.52 | 50.17 | 69.06 | 5.57 |
36+
| CRNN_VGG-BiLSTM-CTC | 82.63 | 82.07 | 92.96 | 88.867 | 66.28 | 71.01 | 62.37 | 78.03 | 8.45 |
37+
| ResNet-CTC | 84.00 | 84.08 | 92.39 | 88.96 | 67.74 | 74.73 | 67.60 | 79.93 | 44.28 |
38+
39+
The performance of the text recognition model were tesred on OpenCV DNN, and does not include the text detection model.
40+
41+
#### Model selection suggestion:
42+
The input of text recognition model is the output of the text detection model, which causes the performance of text detection to greatly affect the performance of text recognition.
43+
44+
DenseNet_CTC has the smallest parameters and best FPS, and it is suitable for edge devices, which are very sensitive to the cost of calculation. If you have limited computing resources and want to achieve better accuracy, VGG_CTC is a good choice.
45+
46+
CRNN_VGG_BiLSTM_CTC is suitable for scenarios that require high recognition accuracy.

doc/tutorials/dnn/dnn_custom_layers/dnn_custom_layers.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Custom deep learning layers support {#tutorial_dnn_custom_layers}
22

33
@prev_tutorial{tutorial_dnn_javascript}
4+
@next_tutorial{tutorial_dnn_OCR}
45

56
## Introduction
67
Deep learning is a fast growing area. The new approaches to build neural networks

doc/tutorials/dnn/table_of_content_dnn.markdown

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,13 @@ Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
7070
*Author:* Dmitry Kurtaev
7171

7272
How to define custom layers to import networks.
73+
74+
- @subpage tutorial_dnn_OCR
75+
76+
*Languages:* C++
77+
78+
*Compatibility:* \> OpenCV 4.3
79+
80+
*Author:* Zihao Mu
81+
82+
In this tutorial you will learn how to use opencv_dnn module using custom OCR models.

samples/cpp/digits_lenet.cpp

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
// This example provides a digital recognition based on LeNet-5 and connected component analysis.
2+
// It makes it possible for OpenCV beginner to run dnn models in real time using only CPU.
3+
// It can read pictures from the camera in real time to make predictions, and display the recognized digits as overlays on top of the original digits.
4+
//
5+
// In order to achieve a better display effect, please write the number on white paper and occupy the entire camera.
6+
//
7+
// You can follow the following guide to train LeNet-5 by yourself using the MNIST dataset.
8+
// https://github.com/intel/caffe/blob/a3d5b022fe026e9092fc7abc7654b1162ab9940d/examples/mnist/readme.md
9+
//
10+
// You can also download already trained model directly.
11+
// https://github.com/zihaomu/opencv_digit_text_recognition_demo/tree/master/src
12+
13+
14+
#include <opencv2/imgproc.hpp>
15+
#include <opencv2/highgui.hpp>
16+
#include <opencv2/dnn.hpp>
17+
18+
#include <iostream>
19+
#include <vector>
20+
21+
using namespace cv;
22+
using namespace cv::dnn;
23+
24+
const char *keys =
25+
"{ help h | | Print help message. }"
26+
"{ input i | | Path to input image or video file. Skip this argument to capture frames from a camera.}"
27+
"{ device | 0 | camera device number. }"
28+
"{ modelBin | | Path to a binary .caffemodel file contains trained network.}"
29+
"{ modelTxt | | Path to a .prototxt file contains the model definition of trained network.}"
30+
"{ width | 640 | Set the width of the camera }"
31+
"{ height | 480 | Set the height of the camera }"
32+
"{ thr | 0.7 | Confidence threshold. }";
33+
34+
// Find best class for the blob (i.e. class with maximal probability)
35+
static void getMaxClass(const Mat &probBlob, int &classId, double &classProb);
36+
37+
void predictor(Net net, const Mat &roi, int &class_id, double &probability);
38+
39+
int main(int argc, char **argv)
40+
{
41+
// Parse command line arguments.
42+
CommandLineParser parser(argc, argv, keys);
43+
44+
if (argc == 1 || parser.has("help"))
45+
{
46+
parser.printMessage();
47+
return 0;
48+
}
49+
50+
int vWidth = parser.get<int>("width");
51+
int vHeight = parser.get<int>("height");
52+
float confThreshold = parser.get<float>("thr");
53+
std::string modelTxt = parser.get<String>("modelTxt");
54+
std::string modelBin = parser.get<String>("modelBin");
55+
56+
Net net;
57+
try
58+
{
59+
net = readNet(modelTxt, modelBin);
60+
}
61+
catch (cv::Exception &ee)
62+
{
63+
std::cerr << "Exception: " << ee.what() << std::endl;
64+
std::cout << "Can't load the network by using the flowing files:" << std::endl;
65+
std::cout << "modelTxt: " << modelTxt << std::endl;
66+
std::cout << "modelBin: " << modelBin << std::endl;
67+
return 1;
68+
}
69+
70+
const std::string resultWinName = "Please write the number on white paper and occupy the entire camera.";
71+
const std::string preWinName = "Preprocessing";
72+
73+
namedWindow(preWinName, WINDOW_AUTOSIZE);
74+
namedWindow(resultWinName, WINDOW_AUTOSIZE);
75+
76+
Mat labels, stats, centroids;
77+
Point position;
78+
79+
Rect getRectangle;
80+
bool ifDrawingBox = false;
81+
82+
int classId = 0;
83+
double probability = 0;
84+
85+
Rect basicRect = Rect(0, 0, vWidth, vHeight);
86+
Mat rawImage;
87+
88+
double fps = 0;
89+
90+
// Open a video file or an image file or a camera stream.
91+
VideoCapture cap;
92+
if (parser.has("input"))
93+
cap.open(parser.get<String>("input"));
94+
else
95+
cap.open(parser.get<int>("device"));
96+
97+
TickMeter tm;
98+
99+
while (waitKey(1) < 0)
100+
{
101+
cap >> rawImage;
102+
if (rawImage.empty())
103+
{
104+
waitKey();
105+
break;
106+
}
107+
108+
tm.reset();
109+
tm.start();
110+
111+
Mat image = rawImage.clone();
112+
// Image preprocessing
113+
cvtColor(image, image, COLOR_BGR2GRAY);
114+
GaussianBlur(image, image, Size(3, 3), 2, 2);
115+
adaptiveThreshold(image, image, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, 25, 10);
116+
bitwise_not(image, image);
117+
118+
Mat element = getStructuringElement(MORPH_RECT, Size(3, 3), Point(-1,-1));
119+
dilate(image, image, element, Point(-1,-1), 1);
120+
// Find connected component
121+
int nccomps = cv::connectedComponentsWithStats(image, labels, stats, centroids);
122+
123+
for (int i = 1; i < nccomps; i++)
124+
{
125+
ifDrawingBox = false;
126+
127+
// Extend the bounding box of connected component for easier recognition
128+
if (stats.at<int>(i - 1, CC_STAT_AREA) > 80 && stats.at<int>(i - 1, CC_STAT_AREA) < 3000)
129+
{
130+
ifDrawingBox = true;
131+
int left = stats.at<int>(i - 1, CC_STAT_HEIGHT) / 4;
132+
getRectangle = Rect(stats.at<int>(i - 1, CC_STAT_LEFT) - left, stats.at<int>(i - 1, CC_STAT_TOP) - left, stats.at<int>(i - 1, CC_STAT_WIDTH) + 2 * left, stats.at<int>(i - 1, CC_STAT_HEIGHT) + 2 * left);
133+
getRectangle &= basicRect;
134+
}
135+
136+
if (ifDrawingBox && !getRectangle.empty())
137+
{
138+
Mat roi = image(getRectangle);
139+
predictor(net, roi, classId, probability);
140+
141+
if (probability < confThreshold)
142+
continue;
143+
144+
rectangle(rawImage, getRectangle, Scalar(128, 255, 128), 2);
145+
146+
position = Point(getRectangle.br().x - 7, getRectangle.br().y + 25);
147+
putText(rawImage, std::to_string(classId), position, 3, 1.0, Scalar(128, 128, 255), 2);
148+
}
149+
}
150+
151+
tm.stop();
152+
fps = 1 / tm.getTimeSec();
153+
std::string fpsString = format("Inference FPS: %.2f.", fps);
154+
putText(rawImage, fpsString, Point(5, 20), FONT_HERSHEY_SIMPLEX, 0.6, Scalar(128, 255, 128));
155+
156+
imshow(resultWinName, rawImage);
157+
imshow(preWinName, image);
158+
159+
}
160+
161+
return 0;
162+
}
163+
164+
static void getMaxClass(const Mat &probBlob, int &classId, double &classProb)
165+
{
166+
Mat probMat = probBlob.reshape(1, 1);
167+
Point classNumber;
168+
minMaxLoc(probMat, NULL, &classProb, NULL, &classNumber);
169+
classId = classNumber.x;
170+
}
171+
172+
void predictor(Net net, const Mat &roi, int &classId, double &probability)
173+
{
174+
Mat pred;
175+
// Convert Mat to batch of images
176+
Mat inputBlob = dnn::blobFromImage(roi, 1.0, Size(28, 28));
177+
// Set the network input
178+
net.setInput(inputBlob);
179+
// Compute output
180+
pred = net.forward();
181+
getMaxClass(pred, classId, probability);
182+
}
File renamed without changes.

samples/dnn/text_detection.cpp

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,16 @@
22
Text detection model: https://github.com/argman/EAST
33
Download link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
44
5-
Text recognition model taken from here: https://github.com/meijieru/crnn.pytorch
5+
CRNN Text recognition model taken from here: https://github.com/meijieru/crnn.pytorch
66
How to convert from pb to onnx:
77
Using classes from here: https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.py
88
9+
More converted onnx text recognition models can be downloaded directly here:
10+
Download link: https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing
11+
And these models taken from here:https://github.com/clovaai/deep-text-recognition-benchmark
12+
913
import torch
10-
import models.crnn as crnn
14+
from models.crnn import CRNN
1115
1216
model = CRNN(32, 1, 37, 256)
1317
model.load_state_dict(torch.load('crnn.pth'))

samples/dnn/text_detection.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,18 @@
11
'''
22
Text detection model: https://github.com/argman/EAST
33
Download link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
4-
Text recognition model taken from here: https://github.com/meijieru/crnn.pytorch
4+
5+
CRNN Text recognition model taken from here: https://github.com/meijieru/crnn.pytorch
56
How to convert from pb to onnx:
67
Using classes from here: https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.py
8+
9+
More converted onnx text recognition models can be downloaded directly here:
10+
Download link: https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing
11+
And these models taken from here:https://github.com/clovaai/deep-text-recognition-benchmark
12+
713
import torch
8-
import models.crnn as CRNN
14+
from models.crnn import CRNN
15+
916
model = CRNN(32, 1, 37, 256)
1017
model.load_state_dict(torch.load('crnn.pth'))
1118
dummy_input = torch.randn(1, 1, 32, 100)
@@ -23,7 +30,8 @@
2330
parser = argparse.ArgumentParser(
2431
description="Use this script to run TensorFlow implementation (https://github.com/argman/EAST) of "
2532
"EAST: An Efficient and Accurate Scene Text Detector (https://arxiv.org/abs/1704.03155v2)"
26-
"The OCR model can be obtained from converting the pretrained CRNN model to .onnx format from the github repository https://github.com/meijieru/crnn.pytorch")
33+
"The OCR model can be obtained from converting the pretrained CRNN model to .onnx format from the github repository https://github.com/meijieru/crnn.pytorch"
34+
"Or you can download trained OCR model directly from https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing")
2735
parser.add_argument('--input',
2836
help='Path to input image or video file. Skip this argument to capture frames from a camera.')
2937
parser.add_argument('--model', '-m', required=True,

0 commit comments

Comments
 (0)