The image-text-reader library allows you to extract hungarian text from images using Optical Character Recognition (OCR) with the help of the pytesseract library and Pillow for image processing.
note: the current code only works if tessaract is set up for hungarian language pack
- Python 3.x
- Tesseract-OCR
-
Install the required Python libraries:
pip install image-text-reader
-
Install Tesseract-OCR:
-
Windows: Download and install from here.
-
macOS: Use Homebrew to install:
brew install tesseract
-
Linux: Use your package manager, for example:
sudo apt-get install tesseract-ocr
-
-
Create a Python script (e.g.,
test_script.py) and import theocr_imagefunction from theimage_text_readerlibrary:from image_text_reader import ocr_image
-
Set the path to your image and Tesseract-OCR executable:
# Update these paths for your system image_path = 'C:/path_to_your_image.jpg' # Replace with the path to your test image tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe' # Path to Tesseract executable extracted_text = ocr_image(image_path, tesseract_cmd=tesseract_cmd) print("Extracted Text:") print(extracted_text)
-
Run your script:
python test_script.py
-
Preprocessing Function:
The
preprocess_imagefunction prepares the image for OCR by converting it to grayscale, sharpening it, and enhancing its contrast:def preprocess_image(image_path): image = Image.open(image_path).convert('L') image = image.filter(ImageFilter.SHARPEN) enhancer = ImageEnhance.Contrast(image) image = enhancer.enhance(2) return image
-
OCR Function:
The
ocr_imagefunction processes the image and then extracts the text usingpytesseract:def ocr_image(image_path, tesseract_cmd=None): if tesseract_cmd: pytesseract.pytesseract.tesseract_cmd = tesseract_cmd image = preprocess_image(image_path) text = pytesseract.image_to_string(image, lang='eng') return text
Contributions are welcome! Please open an issue or submit a pull request for any changes.
This project is licensed under the MIT License. See the LICENSE file for more details.
For more information, visit the image-text-reader library page on PyPI.