You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds three possible values for `source` field:
* `pdfminer` as source for elements directly obtained from PDFs.
* `OCR-tesseract` and `OCR-paddle` for elements obtained with the
respective OCR engines.
All those new values are stored in a new class `Source` in unstructured_inference>constants.py
This would help users filter certain elements depending on how were
obtained.
0 commit comments