Skip to content
Discussion options

You must be logged in to vote

Have you tried just taking a rule based approach to this? Based on your example the list elements have a blank space between each one so there's no ambiguity. You could also check the length of a line + the first word of the next line to see if a line was wrapped.

You could use the textcat in spaCy for this, but it's designed for classification based on content - like splitting newspaper articles or product descriptions into categories. Your problem here is more about whether joining the lines is grammatical or how the layout should be interpreted. You might have more luck training a sentence recognizer.

If you want to train a text classifier in spaCy anyway you would need to just label y…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / textcat Feature: Text Classifier
2 participants