How can I improve a Spacy matcher that uses too much memory? #11462
Unanswered
mahagilo
asked this question in
Help: Other Questions
Replies: 1 comment 2 replies
-
Hi @mahagilo! Please use code formatting, it makes it easier for us to read the code and help you. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I need Spacy matcher to detect keywords from a database in a text (including varieties like singular/plural). I pre-build the Spacy matcher and use pickle to save both matcher and nlp, see code below:
Simplified version of matcher build
for term in keylist:
matcher.add(term, {"LOWER": term.lower_})
with open(save_matcher, "wb") as f: pickle.dump(matcher, f)
with open(save_nlp, "wb") as f: pickle.dump(nlp, f)
current, peak = tracemalloc.get_traced_memory()
print(f"Memory use {current / 106}MB; Peak {peak / 106}MB")
#Number of keywords: 8961
#Time spent building matcher: 62.46
#Memory use 22.664226MB; Peak 261.314393MB
Simplified version of load matcher and NLP
nlp = pickle.load(open(save_nlp, "rb"))
matcher = pickle.load(open(save_matcher, "rb"))
current, peak = tracemalloc.get_traced_memory()
print(f"Memory after loading Spacy is {current / 106}MB; Peak was {peak / 106}MB")
#Memory after loading Spacy is 719.07097MB; Peak was 934.495292MB
It takes a long time to build Spacy matcher with thousands of keywords, so I need to save it after building. Pickle save is the only option (?). When I load the matcher and nlp from pickle, it uses a lot more memory and my cloud bills will bankrupt me ☹ Any thoughts how to improve saving Spacy matcher?
Beta Was this translation helpful? Give feedback.
All reactions