Noob: High-level usage question about entity recognition vs entity linking #8874

metalaureate · 2021-08-03T14:09:03Z

metalaureate
Aug 3, 2021

spaCy is amazing but I wondered if I could get a steer on how to break down my problem into spaCy solution/feature domains so that I can tackle an engineering prototype. I realize I may have bitten off more than I can chew, but such is life.

Users write a daily journal which recognizes entities in their personal world. These are a mix of real-world objects like London and referents that only have meaning in the context of their journal, e.g. John, Bubba, School, Work. These referents also have a specialized labeling scheme (e.g. Nature, Person, Group, Self). Is this a transfer learning custom NER problem?
Users have various eponyms for entities. E.g., 'John' is sometimes 'Jonny', or 'John Smith'. 'Bubba' is sometimes 'Grandmother'. 'Work' is sometimes 'Acme Inc'. Is this an entity linking problem? These eponyms are limited to their private discourse (for now).
Since some of these are real world entities (e.g. Acme Inc, London), is this a problem in using a base knowledge graph such as DBPedia to underpin the individual user's entity linker?
Lastly, the entities in the user's personal journal world have a specific set of social membership relations between them. E.g., John Smith belongs to Acme Inc. Bubba is John Smith's grandmother. The Cybermen are the enemy of John Smith. Is this a custom entity linking problem? I am a bit confused about using spaCy for deducing the relationship between entities beyond disambiguating eponyms.

svlandeg · 2021-08-03T16:07:28Z

svlandeg
Aug 3, 2021

Hi!

You'd probably benefit from some more generic tutorials on some of these concepts, to get a better grasp of different NLP tools and how they relate to your use-case. That said, I'll try to add my 2 cents...

(...) mix of real-world objects like London and (...) e.g. John, Bubba, School, Work. These referents also have a specialized labeling scheme (e.g. Nature, Person, Group, Self). Is this a transfer learning custom NER problem?

This is a classic NER problem, in which it doesn't really matter whether an entity ("Obama", "John") is broadly known or not. You'd tag both as "Person". The follow-up task of Entity Linking will worry about normalizing these to known identifiers (if possible at all).

"School" and "Work" wouldn't typically be named entities though, as they are just common words.

You didn't specify what you meant by "transfer learning". Typically, you'll want to train an NER model from scratch with your specific label/annotation scheme. You can have a look at using a transformer-based model to benefit from language model pretraining though: https://spacy.io/usage/embeddings-transformers

(...) 'John' is sometimes 'Jonny', or 'John Smith (...).

Yes, this is an entity linking problem. Either your knowledge base will need to know about the different possible variants/synonyms, or you'll have to implement some kind of heuristic / fuzzy matching to identify likely entities, given a mention that's not in the KB. Ultimately & ideally, an entity linking step will map all occurrences & variants to the same unique ID. cf https://github.com/explosion/projects/tree/v3/tutorials/nel_emerson for more background & example implementation.

Since some of these are real world entities (e.g. Acme Inc, London), is this a problem in using a base knowledge graph such as DBPedia to underpin the individual user's entity linker?

I'm not sure I understand the question, but yes you can use DBPedia or Wikidata or any other existing knowledge base - either as such or as a basis to expand upon for your specific use-case.

(...) John Smith belongs to Acme Inc. Bubba is John Smith's grandmother. The Cybermen are the enemy of John Smith. Is this a custom entity linking problem?

This is not an "entity linking" problem, but a "relation extraction" problem. When a sentence reads "X is the enemy of Y" or "A works at B", you'd be able to link two entities together with a specific relationship. cf also https://github.com/explosion/projects/tree/v3/tutorials/rel_component

6 replies

polm Aug 4, 2021

As an extra tip, for a good overview of all these problems I can recommend the Jurafsky and Martin book:

https://web.stanford.edu/~jurafsky/slp3/

metalaureate Aug 6, 2021
Author

Thank you!

metalaureate Aug 6, 2021
Author

@polm @svlandeg esteemed spaCy gurus, if I am not abusing your good will, I was hoping I could ask a follow-on question after digging into this a bit: I have determined that all of my relation extractions are set memberships, e.g. Thor is a member of the Avengers. Coulson is a member of SHIELD. Peter is member of the Peabody High School. The sets are all holonarchical (e.g., nested logical classes, Peter is a member of Peabody High School, which is a member of Littleton Public Schools). Does that change the nature of the relation extraction problem or any of the suggested approaches?

polm Aug 7, 2021

That doesn't change the fundamental problem, though it gives you a few other things you can look at depending on what you want to do, such as hypernym discovery or "set expansion".

metalaureate Aug 7, 2021
Author

Great, thank you for boosting my confidence in this direction and giving me more leads to follow up on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Noob: High-level usage question about entity recognition vs entity linking #8874

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Noob: High-level usage question about entity recognition vs entity linking #8874

Uh oh!

metalaureate Aug 3, 2021

Replies: 1 comment · 6 replies

Uh oh!

svlandeg Aug 3, 2021

Uh oh!

polm Aug 4, 2021

Uh oh!

metalaureate Aug 6, 2021 Author

Uh oh!

Uh oh!

metalaureate Aug 6, 2021 Author

Uh oh!

polm Aug 7, 2021

Uh oh!

metalaureate Aug 7, 2021 Author

metalaureate
Aug 3, 2021

Replies: 1 comment 6 replies

svlandeg
Aug 3, 2021

metalaureate Aug 6, 2021
Author

metalaureate Aug 6, 2021
Author

metalaureate Aug 7, 2021
Author