Skip to content
Discussion options

You must be logged in to vote

Hey Premshay,

When training a pipeline with the goal of training only the edit-tree lemmatizer we only need lemma annotations and no pos or tag information is required.

The edit-tree lemmatizer can also learn from partially annotated data, which means that you can train on your partial gold-data if you'd like to try.

You can most definitely merge your gold annotation with the silver data produced by your other model. You just need to create a DocBin as usual and add the token.lemma_ information. If a token is assigned the empty lemma "" then it is skipped during training.

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by Premshay
Comment options

You must be logged in to vote
1 reply
@jmyerston
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization feat / training Feature: Training utils, Example, Corpus and converters
3 participants