A website uses Captchas on a form to keep the web-bots away. However, the captchas it generates, are quite similar each time:
- the number of characters remains the same each time
- the font and spacing is the same each time
- the background and foreground colors and texture, remain largely the same
- there is no skew in the structure of the characters.
- the captcha generator, creates strictly 5-character captchas, and each of the characters is either an upper-case character (A-Z) or a numeral (0-9).




A set of twenty-five captchas is provided, such that, each of the characters A-Z and 0-9 occur at least once in one of the captchas' text. Design and create a simple AI model or algorithm to identify the unseen captchas.
(i) Each captcha image is similar except for the embedded characters. However, each character is 10 pixel (height) by 8 pixel (width)
(ii) The first character starts at pixel position (5,11), assuming that the top-left corner of the image is position (0,0). In addition, each character is separated by a 1-pixel column
(iii) We first read in the captcha image as a 2D numpy array and apply thresholding to remove the background. Each pixel of the thresholded image is represented as either 0 or 255. Using our knowledge of how the characters are positioned and lined up, we can extract each character one by one
(iv) Next, we generate a random mask (2D numpy array of shape (10,8)) with each element of this mask being a value between 0 and 1
(v) After all the five characters are extracted from an image (each character is a 2D numpy array of shape (10,8)), we will perform element-wise multiplication of each character numpy array with the random mask and sum the results. This numeric sum represents each character
(vi) We use all the provided images, performing steps (i) to (v) on each of the image. Using a dictionary structure, we can identify the character as a new key if its numeric value is not found in the dictionary. This creates the vocabulary of the 36 characters 'A' to 'Z' and '0' to '9'. We will use this dictionary as a lookup table
(vii) For any given new image, the algorithm will perform steps (i) to (iv) and apply the multiplication with the pre-defined random mask to compute a numeric sum. Using this numeric sum, the character can be identified from the lookup table by matching the numeric sum to the key
PaddleOCR is a state-of-the-art, versatile OCR framework well-suited for developers and enterprises needing fast, accurate, and multilingual text recognition and document parsing capabilities (https://github.com/PaddlePaddle/PaddleOCR)
- PaddleOCR is an open-source OCR toolkit developed by PaddlePaddle, focused on fast, accurate text recognition using deep learning models
- Supports various input formats including JPEG, PNG, BMP, and PDF
- To install PaddleOCR: https://www.paddlepaddle.org.cn/en/install/quick?docurl=undefined
- download repository
- pip install using requirements.txt
pip install -r requirements.txt
- run the ocr_model.py script as follows:
Option A: 2 inputs - input and output filepaths
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt'
Option B: 3 inputs - input and output filepaths, option to switch between Approach 1 and Approach 2
# Approach 1
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 1
# Approach 2
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 2
Option C: 4 inputs - input and output filepaths, threshold to remove background in image when using Approach 1, option to switch between Approach 1 and Approach 2
# Approach 1
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 50 1
# Approach 2 (third argument has no effect)
python ocr_model.py 'filepath_of_input_captcha_image.jpg' 'filepath_of_output_file_containing_extracted_text_string.txt' 50 2