How to add new Embedding Parameter to NER Model #10437
-
Hello, i recently came across Huggingface's LayoutLMv2 Model, which is basically a bert token classifier for documents that includes layout embeddings. Layout Embeddings are the normalized x/y-cordinate and the width and height of the words inside a document. Therefore OCR is used to identify these bounding boxes and the words. So now id like to modify an existing spacy ner model to do the same. And for that i need to add these layout parameters to the embedding layer. Does someone know how I can do that. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
We don't have a guide or example for this yet, so it'll be a little involved, but you might want to look at implementing a custom tok2vec layer. You can set your extra data as underscore attributes on the Doc and pass it into a pipeline / |
Beta Was this translation helpful? Give feedback.
We don't have a guide or example for this yet, so it'll be a little involved, but you might want to look at implementing a custom tok2vec layer. You can set your extra data as underscore attributes on the Doc and pass it into a pipeline /
nlp
object where thetok2vec
reads those attributes and combines them with the default spaCy embeddings somehow.