-
-
Notifications
You must be signed in to change notification settings - Fork 148
add clean_extracted_text method #1651
Copy link
Copy link
Open
Description
This method should be called after Doctor has extracted text from the binary content and works to clean up extraction artifacts, formatting issues, or unwanted text that appears in the extracted output. This is typically needed when the extraction process introduces unwanted characters, preserves headers/footers from the original document, or includes metadata that should be removed from the final plain text.
This will help us solve issue #6443 from CL
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
PR'd Issues 🤞