Skip to content
Discussion options

You must be logged in to vote

Is this related to #8645, or is the corpus you're working with public?

Have you tried using noun chunks? spaCy has a built in noun chunks feature that can capture phrases of the pattern ADJ* NOUN+, which will capture your italic phrases. It will also capture other phrases ("floor length beauty", "kimono tie wrap front", "ruched waist"), but you can use post-processing to filter them out (for example, to only keep phrases including "sleeves").

Dependency parsing can be useful for adjective-noun constructions, but given their structure isn't very complicated in English, matching flat tag sequences (like the Matcher can do) is also effective.

Phrase matching (as opposed to the plain Matcher)…

Replies: 1 comment 12 replies

Comment options

You must be logged in to vote
12 replies
@polm
Comment options

@vahuja4
Comment options

@vahuja4
Comment options

@vahuja4
Comment options

@polm
Comment options

Answer selected by vahuja4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer feat / matcher Feature: Token, phrase and dependency matcher
2 participants