Memory sharing in Python 3.8 for SpaCy models #5051

ejohb · 2020-02-22T18:11:08Z

ejohb
Feb 22, 2020

This is more of a question than a feature request. But could SpaCy use the new shared memory interface in Python 3.8 to share a single copy of its language model across multiple processes? Is it possible? If so, are there plans to support it?

grutt · 2020-03-05T19:52:13Z

grutt
Mar 5, 2020

I'd also like to see if this is possible

0 replies

tiwansh · 2020-04-18T09:07:09Z

tiwansh
Apr 18, 2020

This would save so much memory.

0 replies

forgetso · 2020-04-23T18:34:39Z

forgetso
Apr 23, 2020

I think something of this sort may already be happening. When I load sense2vec at a global level and then access the object from within multiprocess python instances, each instance is apparently using 6.2GB of memory. But the overall system resource sits at 9.6GB, swap use is negligible.

0 replies

honnibal · 2020-05-21T09:13:13Z

honnibal
May 21, 2020
Maintainer

Most of the memory in spaCy models lives in numpy (or cupy) arrays, so it's possible this is happening for us automagically? The main data that isn't shared is the Vocab, but this should only be 100 or 200mb. The vocab is also written to frequently, so I don't think it's a good candidate for sharing across the processes.

If not, the main chunk of memory that you'll want to share is the vectors. The non-vectors memory in the models is really small. So all you need to do is reallocate nlp.vocab.vectors.data to use a numpy array backed by shared memory.

0 replies

honnibal · 2020-05-21T09:16:02Z

honnibal
May 21, 2020
Maintainer

If someone's looking for project ideas, I think this could be a good blog post project? You could:

Explain this new feature of Python 3.8 (which a lot of developers will find interesting)
Implement a function which walks over the spaCy Language object and replaces all the arrays with ones backed by shared memory.
Show the results in a benchmark

0 replies

forgetso · 2020-06-13T19:57:58Z

forgetso
Jun 13, 2020

Turns out I was wrong about the memory being shared across multiprocesses...

I am using sense2vec pretty extensively. If I max out my RAM (very likely) then I may tackle this.

0 replies

forgetso · 2020-07-15T13:35:05Z

forgetso
Jul 15, 2020

Looking at the data in the reddit model, the vectors file is 4.2Gb and is presumably causing the majority of RAM usage. Wouldn't Vectors.data need to be replaced in spacy/vectors.pyx with a shared memory? So modify load_vectors to point to a shared memory instead of a numpy array, e.g.


        # change load vectors to load a shared memory if it exists
        def load_vectors(path):
            xp = Model.ops.xp
            if self.shared_memory_name is not None and self.shared_memory_shape is not None:
                existing_shm = shared_memory.SharedMemory(name=self.shared_memory_name)
                self.data = xp.ndarray(self.shared_memory_shape, dtype='f', buffer=existing_shm.buf)
            elif path.exists():
                data = xp.load(str(path))
                shm, shape = self.create_shared_block(data)
                self.shared_memory_name = shm.name
                self.shared_memory_shape = shape

    def create_shared_block(self, to_share):
        xp = Model.ops.xp
        shm = shared_memory.SharedMemory(create=True, size=to_share.nbytes)
        # Now create a NumPy array backed by shared memory
        np_array = xp.ndarray(to_share.shape, dtype='f', buffer=shm.buf)
        np_array[:] = to_share[:]  # Copy the original data into shared memory
        return shm, np_array.shape

I've run the above on my machine after building spacy and it seems to work. You can see from the below that the processes each use 5.7Gb of RAM but the overall system usage is 12.4Gb. I am not modifying the vectors at all but I believe this can be achieved by adding locks to the functions that change the vectors object.

0 replies

forgetso · 2020-07-15T16:13:41Z

forgetso
Jul 15, 2020

I've forked the library and committed the changes to the vectors.pyx file in a branch called shared-memory. The branch is based on master. I can't get the tests to run because of the following error:

(uniqueness) ➜  dev pytest ./spaCy/spacy/tests 
ImportError while loading conftest '/home/user/dev/spaCy/spacy/tests/conftest.py'.
spaCy/spacy/__init__.py:12: in <module>
    from . import pipeline
spaCy/spacy/pipeline/__init__.py:4: in <module>
    from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker
E   ModuleNotFoundError: No module named 'spacy.pipeline.pipes'

I've tried modification of the vectors.data object in 4 processes simultaneously and it seemed to mostly work with the following arbitrary code. Someone who is interested in performance of writing could check if it's better or worse than multiple objects in memory.

    df['word_sense'] = df['word'].apply(s2v.get_best_sense)
    random_key = None
    try:
        for i in range(1, 5000):
            random_key = random.choice(df['word_sense'])
            rand_vector = np.array([float(random.randint(1, 100) / 1000) for i in range(0, 300)])
            s2v.add(random_key, rand_vector)
    except ValueError:
        print(random_key)
        pass

There were a few exceptions like this:

ValueError: [E197] Row out of bounds, unable to add row 1687 for key 3044688390054964628.

They occurred for the following keys.

person|NOUN
ecstacy|NOUN
cold|ADJ
tight|ADJ

I am not able to replicate the error when modifying the vectors in a single process so it could be related to the same object being shared. Or perhaps it is related to this issue.

0 replies

forgetso · 2020-08-18T19:28:20Z

forgetso
Aug 18, 2020

Eventually I gave up on the idea of getting this working with v2.3. It is now merged with nightly.spacy.io in a branch. There are examples for how to get the branch working here.

Most tests are working with the exceptions of those that fail because of a pydantic class wrapping issue.

The process for using shared memory in child processes is as follows:

Load spacy in parent process (nlp = spacy.load...). spacy assigns nlp.vocab.vectors.data to a shared memory
Create child processes, load spacy and pass a dict with details of the nlp.vocab.vectors.data shared memory
Child spacys will set nlp.vocab.vectors.data to an array backed by shared memory (memoryview in cython)
Do stuff...
Close shared memory in children
Close and unlink shared memory in parent

This saves me approx 5GB of RAM when running 4 processes and the common crawl fasttext model (crawl-300d-2M.vec.zip). The amount of RAM saved increases as you add processses.

I'd be happy to help make this production ready if its something spacy want in the codebase.

0 replies

forgetso · 2020-11-12T20:26:56Z

forgetso
Nov 12, 2020

I've been working on my shared memory fork for some time and I recently realised that I could save more RAM by using vector binarization. To give an idea, Common Crawl embeddings are 500Mb in memory when binarized (256 length) and 2.4Gb when represented by floats.

I added the binarization stuff into a pull for spaCy v2.3 and I've just finished porting it to the shared memory branch of nightly. It turns out that the shared memory cdef is not compatible with fused ctypedefs.

Sharing the array in the Vectors class in cython looks like this:

cdef public float[:,:] data

But if the data can be either int or float then there is a TypeError. Cython allows for classes to be fused as so:

ctypedef fused int_or_float:
    int
    float

However, these fused types (Templates) cannot be used in classes. So I've duplicated the vectors.pyx file to create a second class called VectorsBin in vectorsbin.pyx. The class is then selected dynamically in various places based on the config file or the dtype of the data. The updated nightly branch is here.

3 replies

honnibal Jan 21, 2021
Maintainer

Does the Cython class need to specify it as a buffer? We could just type it as object. If there's some magic to the buffer interface we have to work with, can we do something like:

cdef struct VectorHandleC:
    # Don't take the specifics here too literally. Maybe we just want to use the Python buffer
    # struct or something, instead of exploiting that our vectors data should be 2d.
    void* data
    int n_row
    int n_col
    int stride
    char dtype
    char device

cdef class VectorHandle:
    # I usually like to have a plain old struct when I'm working with C representations, so that
    # I can stack allocate and get a pointer if necessary.
    # The cdef class then owns the struct and provides the Python API.
    cdef VectorHandleC c

    def __init__(self, array):
        ...

     def __array_interface__(self):
        ...

forgetso Jan 22, 2021

This was something I struggled with. I couldn't get Cython to compile using object but it seems to compile ok with the latest version ('3.0a6'). I haven't done enough tests to ensure it's properly working with cdef public object data but I'll check it later this weekend.

forgetso Jan 25, 2021

Setting data to be an object seems like it is working.

honnibal · 2021-01-21T23:30:33Z

honnibal
Jan 21, 2021
Maintainer

Hi @forgetso ,

Sorry for missing this before, and thanks for keeping at this idea. I'm thinking about how to best isolate this change and make it production-ready without the changes spreading out over the codebase.

Do you think the following would be a good approach?

Make a MultiProcessNLP class that accepts a Language instance. This class should only be created in the parent.
Create child processes inside the rest of the program. Pass the reference to MultiProcessNLP instance into the child process.
Inside the child process, ask MultiProcessNLP to give you an nlp object backed by shared memory.
Profit?

In general the multiprocessing stuff hasn't been worked on very much in spaCy. One reason is that before v3, we also had to support Python2. Combined with the multiple OSes we have to support (Linux, OSX and Windows) and supporting both CPU and GPU, there's a large matrix of places where things can be different.

1 reply

forgetso Jan 25, 2021

Is MultiProcessNLP a shared class containing methods to load spacy? As long as the class is simple, this could work using python's Manager. Apparently, objects nested within objects do not work so well so I don't think you'll be able to share an entire nlp object with all of spaCy's nested classes. A basic MultiProcessNLP shared using Manager could be:

import spacy

class MultiProcessNLP(object):
    def __init__(self):
        self.name = None
        self.dtype = None
        self.shape = None

    def load(self, path):
        if self.name is not None and self.dtype is not None and self.shape is not None:
            return spacy.load(path, shared={'name': self.name, 'dtype': self.dtype, 'shape': self.shape})
        else:
            return spacy.load(path)

My process was even more basic:

Load spacy model in parent, which automatically created the vectors numpy array backed by shared memory
Store the share details (shared name, shape, dtype) in a variable and pass to child processes
Load spacy model in child processes, specifying the share details by editing the spacy load function to take a keyword argument called shared

Finally, this works fine when you're dealing with CPU processing only but could conflict with GPU processing. If you're copying the model to the GPU in, say, 8 different child processes then you're going to have a problem unless you have many GPUs. You could additionally implement shared memory using cuda in the MultiprocessNLP object, which I haven't yet looked into.

Let me know your thoughts.

Uh oh!

Memory sharing in Python 3.8 for SpaCy models #5051

Uh oh!

Replies: 11 comments · 4 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

honnibal May 21, 2020 Maintainer

Uh oh!

honnibal May 21, 2020 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

honnibal Jan 21, 2021 Maintainer

Uh oh!

Uh oh!

Uh oh!

honnibal Jan 21, 2021 Maintainer

Uh oh!

Replies: 11 comments 4 replies

honnibal
May 21, 2020
Maintainer

honnibal
May 21, 2020
Maintainer

honnibal Jan 21, 2021
Maintainer

honnibal
Jan 21, 2021
Maintainer