How to add new Embedding Parameter to NER Model #10437

MichaelRinger · 2022-03-05T18:50:40Z

MichaelRinger
Mar 5, 2022

Hello,

i recently came across Huggingface's LayoutLMv2 Model, which is basically a bert token classifier for documents that includes layout embeddings.

Layout Embeddings are the normalized x/y-cordinate and the width and height of the words inside a document. Therefore OCR is used to identify these bounding boxes and the words.

So now id like to modify an existing spacy ner model to do the same. And for that i need to add these layout parameters to the embedding layer.

Does someone know how I can do that.
Thanks for the help.

Answered by polm

Mar 6, 2022

We don't have a guide or example for this yet, so it'll be a little involved, but you might want to look at implementing a custom tok2vec layer. You can set your extra data as underscore attributes on the Doc and pass it into a pipeline / nlp object where the tok2vec reads those attributes and combines them with the default spaCy embeddings somehow.

View full answer

polm · 2022-03-06T06:25:29Z

polm
Mar 6, 2022

We don't have a guide or example for this yet, so it'll be a little involved, but you might want to look at implementing a custom tok2vec layer. You can set your extra data as underscore attributes on the Doc and pass it into a pipeline / nlp object where the tok2vec reads those attributes and combines them with the default spaCy embeddings somehow.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to add new Embedding Parameter to NER Model #10437

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to add new Embedding Parameter to NER Model #10437

Uh oh!

MichaelRinger Mar 5, 2022

Replies: 1 comment

Uh oh!

polm Mar 6, 2022

MichaelRinger
Mar 5, 2022

polm
Mar 6, 2022