You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Only allow one Chipper call at a time (#296)
Chipper V2 is very memory hungry. While we work to optimize this, we
need to restrict the server to one call at a time. While the model is in
use, we'll return a 503 "Please try again". Our hosted API should scale
up to meet demand, so the next call should route to an available server.
This includes a refactor to how partition_kwargs are passed to either
parallel mode, local partition, or local partition with the new Chipper
protection.
To verify, try calling Chipper twice:
```
curl -X POST 'http://localhost:8000/general/v0/general' --form files="@$file" --form strategy=hi_res --form hi_res_model_name=chipper &
curl -X POST 'http://localhost:8000/general/v0/general' --form files="@$file" --form strategy=hi_res --form hi_res_model_name=chipper
```
The second call will get a 503 response.
Other changes:
* Return a 400 error if Chipper isn't loaded. The model is private, make
sure we explain this for users who self host
* Pass the huggingface token to `make docker-start-api` for better dev
experience
* Add a `make docker-start-bash` while we're in here
0 commit comments