Skip to content
Discussion options

You must be logged in to vote

You can use documents of that length, but in general it's easier to work with documents if you cut them down to paragraph length. In your case there don't seem to be real paragraphs but it looks like you could split the data into lines without losing information.

There are other issues with your data. A lot of lines are irrelevant and could be pre-filtered, which would make the rest of your task much easier. You can filter lines by removing set phrases or overly short lines.

The show title is the first line, but there's not any useful context for the model to learn, or any useful keywords really. So if most of your data looks like that it won't help.

You're labelling the show start time a…

Replies: 2 comments 17 replies

Comment options

You must be logged in to vote
15 replies
@aniyyanz08
Comment options

@polm
Comment options

@aniyyanz08
Comment options

@aniyyanz08
Comment options

@polm
Comment options

Answer selected by svlandeg
Comment options

You must be logged in to vote
2 replies
@polm
Comment options

@shrinidhin
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer
3 participants