Best way to gather relationships between entities (NER and REL models) #10808
Replies: 5 comments 29 replies
-
Hi @fernandonjardim , For posterity, I advise you to copy the sample JSON in your post. Be sure to enclose it with triple-backticks so that it's more readable. The GDocs file may get deleted, or its access may change, and future users may not be able to see what the post is about. For the problem itself, you'd definitely want to label at least a hundred or so samples to get a decent model. However, it seems that some of the entities can be obtained by simple business rules. A combination of that + a model might actually help. You can use spaCy's Matcher for creating some rule sets. You might also want to explore techniques like weak supervision to reduce your labelling costs. |
Beta Was this translation helpful? Give feedback.
-
Hi @ljvmiranda921 ![ Thanks very much for your fast answer and tips! So instead of having two models, one for NER and another one for Relationship between NERs (as I am currently doing it, following this article), you would recommend to go with only NERs, doing some auto labelling and use SPACY's Matcher, is it right? In most of my corpus, the entities /tokens can be really spread out, as follows: On the example above the label "included" (third in blue), got several children ("optionals", in black), and some are really far away, in some cases in other paragraphs. Would the matcher still work in this case? What I guess I am missing, is how to use the matcher on this long distance cases (picture), with the "Hello, World" example, it seems easier haha Also some spans can get more than a label. For instance, when I am describing the itinerary on "Day 2", the span "schooner tour" belong to the hash "day_2" but also to the "not_included" one. Can the matcher get those nuances too? Super thanks for your availability! |
Beta Was this translation helpful? Give feedback.
-
Hi @ljvmiranda921 ! Thanks very much for your answer! I still a bit blocked on getting the relations, and when is better to use business rules and when is better to use NER. I've been through all the spacy docs, and all of them use short sentences as an example. When you get a sentence such as the following, it is easy to create business rules using a matcher or do dependencies analysis: "Apple is opening its first big office in San Francisco" What I am not getting it is how can I use these tools in long sentences, in which the children / dependencies are spread around many sentences as the following example, copying the image as a text, this is a bit of our corpora:
to extract this info:
For this case, would you suggest NER labelling? Or some how still using business rules? If business rules, would you mind in providing just a very simple example, so I can replicate to all the other entities / relations I need to extract? Super thanks. |
Beta Was this translation helpful? Give feedback.
-
From a scale of 0 o 10, could you please roughly tell me how hard is this challenge? I've been months on it and I just cannot ship it out in a simple way which is not NER + Relationship. Having things spread out the text is what is really making this hard. For instance sometimes I have things on the same chunk: "Boarding @ 11 @ Metro Vergueiro" and sometimes I have things on the following "bla bla bla bla bla bla boarding: we will meet in front of the shopping and we will meet @ 11 @ metro Vergueiro" and this makes hard to come up with a scheme for it |
Beta Was this translation helpful? Give feedback.
-
Super thanks for the feedback polm ((:
…On Fri, Aug 26, 2022, 6:53 AM polm ***@***.***> wrote:
Note: I figured out what was up with the web demo - it was highlight all
entities, not just entities created by matching the rules. So the example
you had was being matched as a QUANTITY or something. This issue with the
web demo has been resolved.
—
Reply to this email directly, view it on GitHub
<#10808 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHRL2OMQ3COHYFTPHJ3D4LDV3BLWJANCNFSM5WBZNOMQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to create a model using SPACY to output this JSON, given this text.
To approach this problem I've tried to combine a NER + REL model, but this is not quite good and takes a LOT of time labelling the entities.
Would you suggest a best approach on how could I tackle this challenge using the least labelling possible?
Beta Was this translation helpful? Give feedback.
All reactions