Skip to content
Discussion options

You must be logged in to vote

We don't have any feature for this, no. You might want to limit the longest input you'll handle, or the number of simultaneous requests, or use a worker queue.

It looks like the model wrongly calculates amount of memory available inside the container and trying to allocate too much for it's calculations.

As far as I'm aware the models don't check the total available memory, and like most programs they just use what's available as needed. So it's not making a mistake about the amount of memory available, and there's no bug or anything that we can fix, it's just that you're using more memory than is available.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@VMAtm
Comment options

Answer selected by VMAtm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf / memory Performance: memory use feat / transformer Feature: Transformer
2 participants
Converted from issue

This discussion was converted from issue #9419 on October 21, 2021 07:05.