Skip to content

neuron-synapse/ocr_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

A website uses Captchas on a form to keep the web-bots away. However, the captchas it generates, are quite similar each time:

  • the number of characters remains the same each time
  • the font and spacing is the same each time
  • the background and foreground colors and texture, remain largely the same
  • there is no skew in the structure of the characters.
  • the captcha generator, creates strictly 5-character captchas, and each of the characters is either an upper-case character (A-Z) or a numeral (0-9).
image image image image

Task

A set of twenty-five captchas is provided, such that, each of the characters A-Z and 0-9 occur at least once in one of the captchas' text. Design and create a simple AI model or algorithm to identify the unseen captchas.

Solution

Approach 1: Pixel-based

(i) Each captcha image is similar except for the embedded characters. However, each character is 10 pixel (height) by 8 pixel (width)

(ii) The first character starts at pixel position (5,11), assuming that the top-left corner of the image is position (0,0). In addition, each character is separated by a 1-pixel column

(iii) We first read in the captcha image as a 2D numpy array and apply thresholding to remove the background. Each pixel of the thresholded image is represented as either 0 or 255. Using our knowledge of how the characters are positioned and lined up, we can extract each character one by one

(iv) Next, we generate a random mask (2D numpy array of shape (10,8)) with each element of this mask being a value between 0 and 1

(v) After all the five characters are extracted from an image (each character is a 2D numpy array of shape (10,8)), we will perform element-wise multiplication of each character numpy array with the random mask and sum the results. This numeric sum represents each character

(vi) We use all the provided images, performing steps (i) to (v) on each of the image. Using a dictionary structure, we can identify the character as a new key if its numeric value is not found in the dictionary. This creates the vocabulary of the 36 characters 'A' to 'Z' and '0' to '9'. We will use this dictionary as a lookup table

(vii) For any given new image, the algorithm will perform steps (i) to (iv) and apply the multiplication with the pre-defined random mask to compute a numeric sum. Using this numeric sum, the character can be identified from the lookup table by matching the numeric sum to the key

Approach 2: PaddleOCR

PaddleOCR is a state-of-the-art, versatile OCR framework well-suited for developers and enterprises needing fast, accurate, and multilingual text recognition and document parsing capabilities (https://github.com/PaddlePaddle/PaddleOCR)

⚡ Quick Start

  • download repository
  • pip install using requirements.txt
pip install -r requirements.txt
  • run the ocr_model.py script as follows:

Option A: 2 inputs - input and output filepaths

python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt'

Option B: 3 inputs - input and output filepaths, option to switch between Approach 1 and Approach 2

# Approach 1
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 1
# Approach 2
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 2

Option C: 4 inputs - input and output filepaths, threshold to remove background in image when using Approach 1, option to switch between Approach 1 and Approach 2

# Approach 1
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 50 1
# Approach 2 (third argument has no effect)
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 50 2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages