Skip to content
Discussion options

You must be logged in to vote

There isn't a feature for pre-compilation, and serialization builds the internal state again from text. You might be able to make things faster using Pickle, but that would have to pull in the Vocab/nlp object so I'm not sure how cleanly it would work (it might be fine).

The way you are adding things is a little weird and could maybe be improved. This might be faster:

for label, terms in attribute_word_dict.items():
    for term in self.nlp.tokenizer.pipe(terms):
        self.matcher.add(label, [term])

Normally reducing calls to self.matcher.add might be faster, but if you have a lot of terms then building the list all at once could be causing ineffcient behavior.

What tokenizer are you …

Replies: 1 comment 12 replies

Comment options

You must be logged in to vote
12 replies
@polm
Comment options

@lingvisa
Comment options

@lingvisa
Comment options

@polm
Comment options

@lingvisa
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / matcher Feature: Token, phrase and dependency matcher perf / speed Performance: speed
2 participants