Overview

A website uses Captchas on a form to keep the web-bots away. However, the captchas it generates, are quite similar each time:

the number of characters remains the same each time
the font and spacing is the same each time
the background and foreground colors and texture, remain largely the same
there is no skew in the structure of the characters.
the captcha generator, creates strictly 5-character captchas, and each of the characters is either an upper-case character (A-Z) or a numeral (0-9).

Task

A set of twenty-five captchas is provided, such that, each of the characters A-Z and 0-9 occur at least once in one of the captchas' text. Design and create a simple AI model or algorithm to identify the unseen captchas.

Solution

Approach 1: Pixel-based

(i) Each captcha image is similar except for the embedded characters. However, each character is 10 pixel (height) by 8 pixel (width)

(ii) The first character starts at pixel position (5,11), assuming that the top-left corner of the image is position (0,0). In addition, each character is separated by a 1-pixel column

(iii) We first read in the captcha image as a 2D numpy array and apply thresholding to remove the background. Each pixel of the thresholded image is represented as either 0 or 255. Using our knowledge of how the characters are positioned and lined up, we can extract each character one by one

(iv) Next, we generate a random mask (2D numpy array of shape (10,8)) with each element of this mask being a value between 0 and 1

(v) After all the five characters are extracted from an image (each character is a 2D numpy array of shape (10,8)), we will perform element-wise multiplication of each character numpy array with the random mask and sum the results. This numeric sum represents each character

(vi) We use all the provided images, performing steps (i) to (v) on each of the image. Using a dictionary structure, we can identify the character as a new key if its numeric value is not found in the dictionary. This creates the vocabulary of the 36 characters 'A' to 'Z' and '0' to '9'. We will use this dictionary as a lookup table

(vii) For any given new image, the algorithm will perform steps (i) to (iv) and apply the multiplication with the pre-defined random mask to compute a numeric sum. Using this numeric sum, the character can be identified from the lookup table by matching the numeric sum to the key

Approach 2: PaddleOCR

PaddleOCR is a state-of-the-art, versatile OCR framework well-suited for developers and enterprises needing fast, accurate, and multilingual text recognition and document parsing capabilities (https://github.com/PaddlePaddle/PaddleOCR)

PaddleOCR is an open-source OCR toolkit developed by PaddlePaddle, focused on fast, accurate text recognition using deep learning models
Supports various input formats including JPEG, PNG, BMP, and PDF
To install PaddleOCR: https://www.paddlepaddle.org.cn/en/install/quick?docurl=undefined

⚡ Quick Start

download repository
pip install using requirements.txt

pip install -r requirements.txt

run the ocr_model.py script as follows:

Option A: 2 inputs - input and output filepaths

python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt'

Option B: 3 inputs - input and output filepaths, option to switch between Approach 1 and Approach 2

# Approach 1
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 1
# Approach 2
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 2

Option C: 4 inputs - input and output filepaths, threshold to remove background in image when using Approach 1, option to switch between Approach 1 and Approach 2

# Approach 1
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 50 1
# Approach 2 (third argument has no effect)
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 50 2

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
sampleCaptchas		sampleCaptchas
README.md		README.md
model_vars.pickle		model_vars.pickle
ocr_model.py		ocr_model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Task

Solution

Approach 1: Pixel-based

Approach 2: PaddleOCR

⚡ Quick Start

About

Uh oh!

Releases

Packages

Languages

neuron-synapse/ocr_model

Folders and files

Latest commit

History

Repository files navigation

Overview

Task

Solution

Approach 1: Pixel-based

Approach 2: PaddleOCR

⚡ Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages