EasyOCR Unstructured

EasyOCR Unstructured is a powerful library for Optical Character Recognition (OCR) that can extract text from PDFS, then group the text based on proximity.

It is intended for PDF files that have text that doesn't follow the left to right top to bottom standard of document writing.

Getting Started

pip install easyocr-unstructured

import easyocr_unstructured

# Initialize the EasyOCR Unstructured object
easyocr = EasyocrUnstructured()

# Invoke the OCR process on your PDF file
result = easyocr.invoke('/path/to/your_pdf_file.pdf')

#result will be a list of lists containing strings
from pprint import pprint as pp
pp(result)

Example Output

The output will look something like this:

[
    ["This is the piece of text. Nothing near it"],
    ["This is the second piece of text.", "This is the third piece of text that was close to the second"],
    ["This is the fourth piece of text. Nothing near it"],
    ...
]

Prerequisites

Python 3.12 +

Installing

pip install easyocr-unstructured

Usage

import easyocr_unstructured

easyocr = EasyocrUnstructured()
result = easyocr.invoke('/path/to/your_pdf_file.pdf')

Running the tests

No tests yet

Built With

Wing Pro
Python 3.12
numpy
easyocr
pdf2image
hashlib

Contributing

Please do, any sensible and safe change will be added!

Authors

Kevin Fink

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
benchmarks		benchmarks
docs		docs
easyocr_unstructured		easyocr_unstructured
tests		tests
easyocr_unstructured_text.wpr		easyocr_unstructured_text.wpr
easyocr_unstructured_text.wpu		easyocr_unstructured_text.wpu
readme.MD		readme.MD
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EasyOCR Unstructured

Getting Started

Example Output

Prerequisites

Installing

Usage

Running the tests

Built With

Contributing

Authors

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shorecodeorg/easyocr-unstructured

Folders and files

Latest commit

History

Repository files navigation

EasyOCR Unstructured

Getting Started

Example Output

Prerequisites

Installing

Usage

Running the tests

Built With

Contributing

Authors

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages