Local context in NER training #8370

bjg90 · 2021-06-11T22:00:33Z

bjg90
Jun 11, 2021

Hi,

I am wondering if someone could elaborate on how 'local context' works in the NE labelling, and how best to tune this for better generalisation of labelling.

We have been training a multi-label (5+) NER model from scratch. To train it, we parse a pdf document (around 2 pages) into a single string variable and use this to manually annotate with the labels (50 + labels in a document across the entity types). Each training dataset entry is therefore using an entire labelled pdf document.

Should we be breaking this document up into labelled sentences instead of using a whole document for training datasets? Wondering what affect this has in terms of context when deciding what label to choose on test data - does it compare at a wider level if the training dataset entry is larger?

Thanks!

polm · 2021-06-12T06:11:53Z

polm
Jun 12, 2021

Context is just how far around a given token is capable of affecting model predictions. Whether a large context is useful or not depends on your data.

Your documents are around two pages and you're annotating entities. If you chopped your documents into paragraphs would that make your human annotators unable to annotate things? If so then you want a larger context. On the other hand, if cutting into paragraphs wouldn't affect your ability to annotate entities, passing smaller units of text to the model would make it easier to train.

While you can set the context to be very large with spaCy models, in practice usually it's hard to get benefits from context much larger than a paragraph.

If you're unsure about whether you want a larger context or not, since you already have the annotations, I would recommend experimenting with different settings and see how it affects your performance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Local context in NER training #8370

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Local context in NER training #8370

Uh oh!

bjg90 Jun 11, 2021

Replies: 1 comment

Uh oh!

polm Jun 12, 2021

bjg90
Jun 11, 2021

polm
Jun 12, 2021