Skip to content
Discussion options

You must be logged in to vote

To fully release the model from memory, you'll need to del all references to the model, followed by torch.cuda.empty_cache() and potentially gc.collect() as well. But generally, it's not a very good idea to load the model for each request because it takes long to load the model from the disk and to the memory just to handle one request.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@guqun
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants