Skip to content

Memory usage keeps going up with consecutive predictions #626

@popfly510

Description

@popfly510

Hello everyone,

I use SetFit to fine-tune the model head of a sentence transformer model which is then incorporated into a REST API endpoint. This endpoint mainly does one thing: receive a text and use the model to classify the text. The application is deployed on the cloud where I am able to also get insights on the application metrics like memory usage.

During some stress testing of the deployed application, I have noticed that the memory usage constantly goes up with each incoming request. The memory usage never returns back to the initial value and only goes back down if I manually restart the application.

I have narrowed down the issue to the SetFit library since the only time where the memory usage was not going up was when i disabled the inference code and returned dummy data. For inference I am using the predict_proba("Example text.") method.

What was interesting during my investigation is that the memory usage only goes up when providing unique texts to the model. So for example, if I provide two times in a row the same text, memory usage will go up for the first inference call but will not go up again for the second inference call.

I tried some attempts to utlize gc.collect(), but with no success. I have also tried to understand the SetFit source code, find some hints in the docs or in existing issues, but I was not successful. In #567, the users also report issues where memory is not freed up. But in their case, it was related to trainig and not to inference. Also (maybe) important to mention in my case, the model is loaded into RAM with inference being calculated on the CPU, therefore no GPU is being utilized. Although, even when testing on the GPU, the same issue persists.

Maybe I am missing an intentional feature here where the model is supposed to memorize known inputs for faster inference when inputs repeat? This would at least explain the behaviour. But I would still like to have a possibility to free the used up memory as otherwise my application could crash at some point.

Any input is welcome on this issue and thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions