Skip to content
Discussion options

You must be logged in to vote

The golden rule of training data is that the more it is like your real data, the better your results will be.

When reading a sentence, the context around the entities can be as important as the entities themselves. For example, if you say, "I went to XXX for vacation", we can guess that XXX is a location. If your training data doesn't have any of that context it will be hard to learn from. Building sentences with "random words" will not help.

If you only have a list of keywords you may be able to use a rule based matcher or do weak supervision.

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@haydenso
Comment options

@polm
Comment options

@haydenso
Comment options

@polm
Comment options

Answer selected by haydenso
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer perf / accuracy Performance: accuracy
2 participants