NER vs dep parser vs phrase matching #8691
-
Hello, I have a corpus which consists of sentences describing apparel. Here are a couple of examples: From the above two sentences, as far as sleeves are concerned, I want to capture the italicized parts (everything related to sleeves). Should I use NER, dependency parsing or phrase matching? I tried dependency parsing on the first sentence and didn't work well. It did not capture the word 'free' as a dependent of 'sleeves'. I would like to understand how to decide on a technique to do this, please. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 12 replies
-
Is this related to #8645, or is the corpus you're working with public? Have you tried using noun chunks? spaCy has a built in noun chunks feature that can capture phrases of the pattern Dependency parsing can be useful for adjective-noun constructions, but given their structure isn't very complicated in English, matching flat tag sequences (like the Matcher can do) is also effective. Phrase matching (as opposed to the plain Matcher) is restricted to matching things of the same token length, so it's not what you want here. |
Beta Was this translation helpful? Give feedback.
Is this related to #8645, or is the corpus you're working with public?
Have you tried using noun chunks? spaCy has a built in noun chunks feature that can capture phrases of the pattern
ADJ* NOUN+
, which will capture your italic phrases. It will also capture other phrases ("floor length beauty", "kimono tie wrap front", "ruched waist"), but you can use post-processing to filter them out (for example, to only keep phrases including "sleeves").Dependency parsing can be useful for adjective-noun constructions, but given their structure isn't very complicated in English, matching flat tag sequences (like the Matcher can do) is also effective.
Phrase matching (as opposed to the plain Matcher)…