What is the best way to extract some paragraphs from pdf extracted text #10708
-
I need to extract/structure text from pdf What is the best way to categorize that blocks text? rules is not an option, because each text is from different writer |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi @info2000 , If I may ask, what type of documents are you extracting text from? Also, what kind of categories are you looking for? Because this might be a case of TextCategorization rather than SpanCat. If you need something that's more refined, maybe a two-stage solution can help. Categorize the texts first, then perform a more fine-grained spancat later. Or, a rules-based approach followed by text categorization. |
Beta Was this translation helpful? Give feedback.
Hi @info2000 ,
If I may ask, what type of documents are you extracting text from? Also, what kind of categories are you looking for? Because this might be a case of TextCategorization rather than SpanCat. If you need something that's more refined, maybe a two-stage solution can help. Categorize the texts first, then perform a more fine-grained spancat later. Or, a rules-based approach followed by text categorization.