Skip to content

Commit 4ada8bd

Browse files
ov2.0 c++ text detection demo (#3019)
* OV2.0 API for text_detection_demo/cpp and few fixes in docs * rm blank line * Update demos/text_detection_demo/cpp/include/text_recognition.hpp Co-authored-by: Zlobin Vladimir <[email protected]> * make message clear Co-authored-by: Zlobin Vladimir <[email protected]>
1 parent 7032b09 commit 4ada8bd

File tree

11 files changed

+654
-569
lines changed

11 files changed

+654
-569
lines changed

demos/common/cpp/utils/include/utils/input_wrappers.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ class IInputSource {
3131
std::mutex sourceLock;
3232
};
3333

34-
class InputChannel: public std::enable_shared_from_this<InputChannel> { // note: public inheritance
34+
class InputChannel: public std::enable_shared_from_this<InputChannel> { // note: public inheritance
3535
public:
3636
InputChannel(const InputChannel&) = delete;
3737
InputChannel& operator=(const InputChannel&) = delete;

demos/text_detection_demo/cpp/README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ The demo shows an example of using neural networks to detect and recognize print
1111
* `text-recognition-0014`, which is a recognition network for recognizing text. You should add option `-tr_pt_first` and specify output layer name via `-tr_o_blb_nm` option for this model (see model [description](../../../models/intel/text-recognition-0014/README.md) for details).
1212
* `text-recognition-0015`, which is a recognition network for recognizing text. You should add options `-tr_pt_first`, `-m_tr_ss "?0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"` (supported symbols set), `-tr_o_blb_nm "logits"` (to specify output name) and `-dt simple` (to specify decoder type). You can also specify `-lower` option to convert predicted text to lower-case. See model [description](../../../models/intel/text-recognition-0015/README.md) for details.
1313
* `text-recognition-0016`, which is a recognition network for recognizing text. You should add options `-tr_pt_first`, `-m_tr_ss "?0123456789abcdefghijklmnopqrstuvwxyz"` (supported symbols set), `-tr_o_blb_nm "logits"` (to specify output name) and `-dt simple` (to specify decoder type). You can also specify `-lower` option to convert predicted text to lower-case. See model [description](../../../models/intel/text-recognition-0016/README.md) for details.
14-
* `text-recognition-resnet-fc`, which is a recognition network for recognizing text. You should add option `-tr_pt_first`.
15-
* `handwritten-score-recognition-0001`, which is a recognition network for recognizing handwritten score marks like `<digit>` or `<digit>.<digit>`.
14+
* `text-recognition-resnet-fc`, which is a recognition network for recognizing text. You should add option `-tr_pt_first` and `-dt simple` (to specify decoder type).
15+
* `handwritten-score-recognition-0003`, which is a recognition network for recognizing handwritten score marks like `<digit>` or `<digit>.<digit>`. You should add options `-m_tr_ss "0123456789._"` (supported symbols set) and `-dt ctc` (to specify decoder type).
1616
* `vitstr-small-patch16-224`, which is a recognition network for recognizing text. You should add options `-tr_pt_first`, `-m_tr_ss <path to vocab file>/.vocab.txt` (supported symbols set), `-dt simple` (to specify decoder type), `-start_index 1` (to process output from provided index) and `-pad " "` (to use specific pad symbol).
1717

1818
## How It Works
@@ -96,8 +96,6 @@ Options:
9696
-max_rect_num "<value>" Optional. Maximum number of rectangles to recognize. If it is negative, number of rectangles to recognize is not limited.
9797
-d_td "<device>" Optional. Specify the target device for the Text Detection model to infer on (the list of available devices is shown below). The demo will look for a suitable plugin for a specified device. By default, it is CPU.
9898
-d_tr "<device>" Optional. Specify the target device for the Text Recognition model to infer on (the list of available devices is shown below). The demo will look for a suitable plugin for a specified device. By default, it is CPU.
99-
-l "<absolute_path>" Optional. Absolute path to a shared library with the CPU kernels implementation for custom layers.
100-
-c "<absolute_path>" Optional. Absolute path to the GPU kernels implementation for custom layers.
10199
-no_show Optional. If it is true, then detected text will not be shown on image frame. By default, it is false.
102100
-r Optional. Output Inference results as raw values.
103101
-u Optional. List of monitors to show initially.
Lines changed: 30 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,43 @@
1-
// Copyright (C) 2019 Intel Corporation
1+
// Copyright (C) 2019-2022 Intel Corporation
22
// SPDX-License-Identifier: Apache-2.0
33
//
44

55
#pragma once
66

7+
#include <map>
78
#include <string>
89
#include <vector>
910

10-
#include <inference_engine.hpp>
1111
#include <opencv2/opencv.hpp>
12-
#include <utils/ocv_common.hpp>
1312

14-
class Cnn {
15-
public:
16-
Cnn(const std::string &model_path, const std::string& model_type, InferenceEngine::Core & ie, const std::string & deviceName,
17-
const cv::Size &new_input_resolution = cv::Size());
18-
19-
virtual InferenceEngine::BlobMap Infer(const cv::Mat &frame);
20-
21-
size_t ncalls() const {return ncalls_;}
22-
double time_elapsed() const {return time_elapsed_;}
23-
const cv::Size& input_size() const {return input_size_;}
24-
const std::string model_type;
25-
protected:
26-
cv::Size input_size_;
27-
int channels_;
28-
std::string input_name_;
29-
InferenceEngine::InferRequest infer_request_;
30-
std::vector<std::string> output_names_;
13+
#include "openvino/openvino.hpp"
3114

32-
double time_elapsed_;
33-
size_t ncalls_;
34-
};
3515

36-
class EncoderDecoderCNN : public Cnn {
37-
public:
38-
EncoderDecoderCNN(std::string model_path, std::string model_type,
39-
InferenceEngine::Core &ie, const std::string &deviceName,
40-
const std::string &out_enc_hidden_name,
41-
const std::string &out_dec_hidden_name,
42-
const std::string &in_dec_hidden_name,
43-
const std::string &features_name,
44-
const std::string &in_dec_symbol_name,
45-
const std::string &out_dec_symbol_name,
46-
const std::string &logits_name,
47-
size_t end_token
48-
);
49-
InferenceEngine::BlobMap Infer(const cv::Mat &frame) override;
50-
private:
51-
InferenceEngine::InferRequest infer_request_decoder_;
52-
std::string features_name_;
53-
std::string out_enc_hidden_name_;
54-
std::string out_dec_hidden_name_;
55-
std::string in_dec_hidden_name_;
56-
std::string in_dec_symbol_name_;
57-
std::string out_dec_symbol_name_;
58-
std::string logits_name_;
59-
size_t end_token_;
60-
void check_net_names(const InferenceEngine::OutputsDataMap &output_info_decoder,
61-
const InferenceEngine::InputsDataMap &input_info_decoder
62-
) const;
16+
class Cnn {
17+
public:
18+
Cnn(const std::string& modelPath, const std::string& modelType, const std::string& deviceName,
19+
ov::Core& core, const cv::Size& new_input_resolution = cv::Size());
20+
21+
virtual std::map<std::string, ov::Tensor> Infer(const cv::Mat& frame) = 0;
22+
23+
size_t ncalls() const { return m_ncalls; }
24+
double time_elapsed() const { return m_time_elapsed; }
25+
const cv::Size& input_size() const { return m_input_size; }
26+
27+
protected:
28+
int m_channels;
29+
cv::Size m_input_size;
30+
cv::Size m_new_input_resolution;
31+
const std::string m_modelPath;
32+
const std::string m_modelType;
33+
const std::string m_deviceName;
34+
std::string m_input_name;
35+
std::vector<std::string> m_output_names;
36+
ov::Layout m_modelLayout;
37+
ov::Core& m_core;
38+
ov::InferRequest m_infer_request;
39+
std::shared_ptr<ov::Model> m_model;
40+
41+
double m_time_elapsed;
42+
size_t m_ncalls;
6343
};
64-
65-
class DecoderNotFound {};
Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,33 @@
1-
// Copyright (C) 2019 Intel Corporation
1+
// Copyright (C) 2019-2022 Intel Corporation
22
// SPDX-License-Identifier: Apache-2.0
33
//
44

55
#pragma once
66

7+
#include <map>
78
#include <vector>
9+
#include <string>
810

9-
#include <inference_engine.hpp>
1011
#include <opencv2/opencv.hpp>
1112

12-
std::vector<cv::RotatedRect> postProcess(const InferenceEngine::BlobMap &blobs, const cv::Size& image_size,
13-
const cv::Size& image_shape, float cls_conf_threshold,
14-
float link_conf_threshold);
13+
#include "openvino/openvino.hpp"
14+
15+
#include "cnn.hpp"
16+
17+
class TextDetector : public Cnn {
18+
public:
19+
TextDetector(const std::string& model_path, const std::string& model_type, const std::string& deviceName,
20+
ov::Core& core, const cv::Size& new_input_resolution = cv::Size()) :
21+
Cnn(model_path, model_type, deviceName, core) {};
22+
23+
std::map<std::string, ov::Tensor> Infer(const cv::Mat& frame) override;
24+
25+
std::vector<cv::RotatedRect> postProcess(
26+
const std::map<std::string, ov::runtime::Tensor>& output_tensors, const cv::Size& image_size,
27+
const cv::Size& image_shape, float cls_conf_threshold, float link_conf_threshold);
28+
private:
29+
cv::Mat decodeImageByJoin(
30+
const std::vector<float>& cls_data, const ov::Shape& cls_data_shape,
31+
const std::vector<float>& link_data, const ov::Shape& link_data_shape,
32+
float cls_conf_threshold, float link_conf_threshold);
33+
};
Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright (C) 2019-2021 Intel Corporation
1+
// Copyright (C) 2019-2022 Intel Corporation
22
// SPDX-License-Identifier: Apache-2.0
33
//
44

@@ -7,6 +7,44 @@
77
#include <string>
88
#include <vector>
99

10-
std::string CTCGreedyDecoder(const std::vector<float> &data, const std::string& alphabet, char pad_symbol, double *conf);
11-
std::string CTCBeamSearchDecoder(const std::vector<float> &data, const std::string& alphabet, char pad_symbol, double *conf, int bandwidth);
12-
std::string SimpleDecoder(const std::vector<float> &data, const std::string& alphabet, char pad_symbol, double *conf, int start_index);
10+
#include "openvino/openvino.hpp"
11+
12+
#include "cnn.hpp"
13+
14+
std::string CTCGreedyDecoder(const std::vector<float>& data, const std::string& alphabet, char pad_symbol, double* conf);
15+
std::string CTCBeamSearchDecoder(const std::vector<float>& data, const std::string& alphabet, char pad_symbol, double* conf, int bandwidth);
16+
std::string SimpleDecoder(const std::vector<float>& data, const std::string& alphabet, char pad_symbol, double* conf, int start_index);
17+
18+
class TextRecognizer : public Cnn {
19+
public:
20+
TextRecognizer(
21+
const std::string& model_path, const std::string& model_type, const std::string& deviceName,
22+
ov::Core& core,
23+
const std::string& out_enc_hidden_name,
24+
const std::string& out_dec_hidden_name,
25+
const std::string& in_dec_hidden_name,
26+
const std::string& features_name,
27+
const std::string& in_dec_symbol_name,
28+
const std::string& out_dec_symbol_name,
29+
const std::string& logits_name,
30+
size_t end_token);
31+
32+
std::map<std::string, ov::runtime::Tensor> Infer(const cv::Mat& frame) override;
33+
34+
const cv::Size& input_size() const { return m_input_size; }
35+
36+
private:
37+
void check_model_names(
38+
const ov::OutputVector& input_info_decoder, const ov::OutputVector& output_info_decoder) const;
39+
40+
bool m_isCompositeModel;
41+
ov::InferRequest m_infer_request_decoder;
42+
std::string m_features_name;
43+
std::string m_out_enc_hidden_name;
44+
std::string m_out_dec_hidden_name;
45+
std::string m_in_dec_hidden_name;
46+
std::string m_in_dec_symbol_name;
47+
std::string m_out_dec_symbol_name;
48+
std::string m_logits_name;
49+
size_t m_end_token;
50+
};

0 commit comments

Comments
 (0)