Skip to content

spacy caching explanation needed #10757

@saraswat40

Description

@saraswat40

I have a very simple program:

import time
import spacy

start = time.time()
nlp = spacy.load('en_core_web_lg', disable=['tagger', 'ner', 'textcat', '...'])
print(time.time() - start)

The first time I run this I get this output:

# python3 test.py 
183.59882807731628

The second time I run this I get this:

# python3 test.py 
1.3441166877746582

Obviously some sort of caching is happening. Is there an explanation for this somewhere?

My real problem is that I have multiple threads accessing a file like this. For the first invocation, they all get stuck trying to load spacy. My solution is to create a lockfile and the lock acquirer will load spacy and rest of them will wait and use the cache mentioned above. Is there a better way to do this?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions