-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Summary
Allow developers to set different timeouts when loading models for when the model is purged from memory.
Motivation
Many model use cases are latency-sensitive and potentially long-running. For those tasks that have more than a 5 minute gap between requests, the only way to avoid a cold start is to implement a separate polling script which is inefficient.
Proposed Implementation
Change the timeout from a static 5 minutes to a configurable time period - maybe from 1 minute (or even 1 request) to 24 hours?
Add an option to keep the model running without an automatic purge from memory for advanced users (only manually purging the model or replacing with another model)
Technical Considerations
Should probably couple with some feedback mechanism and/or limits set to prevent users from overusing system resources
Should gracefully handle bad shutdowns
Questions for Maintainers
What is the right time range to allow for maximum flexibility?