Predict text using REST API with ability to change model #12279

Shiyinq · 2023-02-15T07:41:54Z

Shiyinq
Feb 15, 2023

let say curently i have 10 model and the model i have will increase 11, 12, 13, 50 ... model and the languange is ID

Method 1

so i will consume the model using REST API the parameter look like this:

text (text to predict)
model_name (choose model because i have a lot model)

the code for API will look like this

@router.post("/{model_name}/predict", tags=["Predict"])
def predict(model_name: str, body: Body):

    nlp = spacy.load(model_name)

    result = nlp(body.text)

    return result.cats

but the problem with that code, the request to predict is slow because

nlp = spacy.load(model_name) # load process 15 - 20 second

everytime i send the text and chose model name, it will load the model and the respone time will slow to.

Method 2

because problem with method 1, i try to change the method:
load model and consume model in diferent API

load model and save to database
i will choose the model first with this API, to load model once

@router.post("/{model_name}/load", tags=["Predict"])
def predict(model_name: str):

    nlp = spacy.load(model_name)
    
    saveLoadModelToDatabase(nlp) # save loaded model to database

    return "OK

predict/consume model
i will predict text with this API but still send model_name for model i have load before

@router.post("/{model_name}/predict", tags=["Predict"])
def predict(model_name: str, body: Body):

    nlp = getLoadModelFromDatabase(model_name) # already loaded from database

    result = nlp(body.text)

    return result.cats

i try split two proses for beter response time when i predict the text

but i canont save load model to database, i have try to save using mongodb but i got error

saveLoadModelToDatabase(nlp) # error bson.errors.InvalidDocument: cannot encode object: <spacy.lang.id.Indonesian object at 0x00000220239FC0D0>

how to save spacy.load() result to database?
any sugest to handle case like this?

Note:

i have already see this: https://stackoverflow.com/questions/61643370/spacy-english-language-model-take-too-long-to-load but that solution is fine if i just have one model, in my case i asume will have a lot model.

Shiyinq · 2023-02-16T09:52:49Z

Shiyinq
Feb 16, 2023
Author

update:
to save spacy.load() i have trying this and save byte data to file.

but when DESERIALIZE the proses is slow, look like spacy.load().

i think when i do (method 2 above) spacy.load() once and then save to database/or file, it will increase speed but in fact no.

nlp = spacy.load("my_model")
nlp = nlp.to_bytes()
saveToFile(nlp)

so this proses just like method 1, slow.

any suggest?

but i don't to do this https://stackoverflow.com/questions/61643370/spacy-english-language-model-take-too-long-to-load because i assume my total model will increase, not just one or two model.

0 replies

kadarakos · 2023-02-16T10:13:51Z

kadarakos
Feb 16, 2023

Hey,

Let me try to understand your problem better. Seems like you have a selection of models and each API call takes a model name and a piece of text. My first question here is whether it would be possible to cache the models in memory?

The simplest and most memory intensive approach would be to keep all models in memory at all times.

A potentially more viable approach is to use an LRU cache policy perhaps? This would mean that you keep at most a fixed number of models in memory and when the cache is full you release the model that was used the least recently.

If the caching approach is viable maybe its worth reviewing the various cache replacement policies for inspiration: https://en.wikipedia.org/wiki/Cache_replacement_policies.

1 reply

Shiyinq Feb 16, 2023
Author

hey @kadarakos thank you for the response.

how do best practice keep all models in memory at all times. ?

this is what i know:
when i run the server in fastapi look like uvicorn main:app --reload, this code will store model in variable nlp_name ,

import spacy

nlp_name = {
  "model_one": spacy.load("id_model_one")
}

def dostuff(text, model_name):
   nlp = nlp_name(model_name)

   return nlp(text)

and then if i have another model 2, i should update the code and then restart server again

nlp_name = {
  "model_one": spacy.load("id_model_one")
  "modle_two": spacy.load("id_mode_two")
}

and then if i have another model 3 , i should update the code and then restart server again...
...

the problem is

if the model i have increased automatically let say based on total register user, i can't do same step like before.
when i restart server it will take long time to start because everymodel in nlp_name spacy.load("id") load time is 15-20 second, so if i have model 10 it will take (10*20 second) 200 second to restart server API.
this will make REST API down for: based on (total model * 20 seconds)

kadarakos · 2023-02-16T12:05:47Z

kadarakos
Feb 16, 2023

This is not really an issue with spacy itself and is more about designing a web application using spacy. As such its difficult for us to give you adequate advice.

The prediction speed can be improved by disabling components that are not required by your application for example: spacy.load("some-model", disable=["parser"]).

The loading can be made faster and reduce memory usage by relying on smaller models especially with smaller vector tables.
For more information on speed we please checkout the related FAQ: #8402

1 reply

Shiyinq Feb 17, 2023
Author

hi @kadarakos

for prediction speed not my issue

but my concern is when i do spacy.load("id") took 15-20 second (make REST API problem)

this is my config for training, i use efficiency and the pipeline just textcat_multilabel and start with spacy.blank("id")

kadarakos · 2023-02-17T10:29:38Z

kadarakos
Feb 17, 2023

First thing I can think about is that the "id" as a language is slow to load because of the tokenizer settings. That said the 15-20 seconds seems a bit too much for me. Ran a quick test like this:

import spacy
import time


class time_context:
    def __enter__(self):
        self.start = time.perf_counter()
        return self

    def __exit__(self, type, value, traceback):
        self.elapsed = time.perf_counter() - self.start


def speed_test_load_blank(lang, it):
    with time_context() as elapsed:
        for i in range(it):
            spacy.blank(lang)
    print("blank time", elapsed.elapsed/it)


def speed_test_load_model(model, it):
    with time_context() as elapsed:
        for i in range(it):
            spacy.load(model)
    print(f"{model} time", elapsed.elapsed/it)


speed_test_load_blank("id", 10)
speed_test_load_model("en_core_web_trf", 10)

On my machine this gives the output:

blank time 2.94317739456892
en_core_web_trf time 3.651105460524559

1 reply

Shiyinq Feb 20, 2023
Author

i'm sorry what i mean is when i say start with spacy.blank("id") i start training using language id not en and the result after training look like this:

my machine

google colab

btw how look like tokenizer settings for lang id in spacy?

kadarakos · 2023-02-23T11:27:42Z

kadarakos
Feb 23, 2023

Unfortunately I cannot really debug the slow loading time without knowing more about the model. I agree that its quite slow. If you would like to focus on debugging the loading time of the id_model_nlp_ku I recommend opening a new Discussion.

You can try excluding various pipeline components to identify which one seems to be the bottleneck: https://spacy.io/usage/processing-pipelines#disabling.

You can try adding more to the coarse speed benchmarking script:

import spacy
import time


class time_context:
    def __enter__(self):
        self.start = time.perf_counter()
        return self

    def __exit__(self, type, value, traceback):
        self.elapsed = time.perf_counter() - self.start


def speed_test_load_blank(lang, it):
    with time_context() as elapsed:
        for i in range(it):
            spacy.blank(lang)
    print("blank time: ", elapsed.elapsed/it)


def speed_test_load_model(model, it=10):
    with time_context() as elapsed:
        for i in range(it):
            spacy.load(model)
    print(f"{model} time: ", elapsed.elapsed/it)


def speed_test_load_exclude(model, it=10):
    speed_test_load_model(model, it)
    components = spacy.load(model).components
    for name, _ in components:
        with time_context() as elapsed:
            for i in range(it):
                spacy.load(model, exclude=[name])
        print(f"without {name}: ", elapsed.elapsed / it)


speed_test_load_exclude("en_core_web_trf", 3)

On my machine this gave:

en_core_web_trf time:  3.9242948181927204
without transformer:  0.4662914667278528
without tagger:  3.774515605221192
without parser:  3.8024589928487935
without attribute_ruler:  3.4329277258366346
without lemmatizer:  3.6756835685422025
without ner:  3.517257502923409

About the question of your web app, I'm afraid its hard to help further since its a design and infrastructure decision whether you can add more resources to be able to keep all models in memory or you prefer to use a cache or some other option.

0 replies

Uh oh!

Predict text using REST API with ability to change model #12279

Uh oh!

Uh oh!

Shiyinq Feb 15, 2023

Method 1

Method 2

Replies: 5 comments · 3 replies

Uh oh!

Uh oh!

Shiyinq Feb 16, 2023 Author

Uh oh!

Uh oh!

kadarakos Feb 16, 2023

Uh oh!

Uh oh!

Shiyinq Feb 16, 2023 Author

Uh oh!

kadarakos Feb 16, 2023

Uh oh!

Shiyinq Feb 17, 2023 Author

Uh oh!

Uh oh!

kadarakos Feb 17, 2023

Uh oh!

Uh oh!

Shiyinq Feb 20, 2023 Author

my machine

google colab

Uh oh!

kadarakos Feb 23, 2023

Shiyinq
Feb 15, 2023

Replies: 5 comments 3 replies

Shiyinq
Feb 16, 2023
Author

kadarakos
Feb 16, 2023

Shiyinq Feb 16, 2023
Author

kadarakos
Feb 16, 2023

Shiyinq Feb 17, 2023
Author

kadarakos
Feb 17, 2023

Shiyinq Feb 20, 2023
Author

kadarakos
Feb 23, 2023