Skip to content

Feature Request: Make model timeouts for unloading from memory configurableΒ #84

@chpiatt

Description

@chpiatt

Summary

Allow developers to set different timeouts when loading models for when the model is purged from memory.

Motivation

Many model use cases are latency-sensitive and potentially long-running. For those tasks that have more than a 5 minute gap between requests, the only way to avoid a cold start is to implement a separate polling script which is inefficient.

Proposed Implementation

Change the timeout from a static 5 minutes to a configurable time period - maybe from 1 minute (or even 1 request) to 24 hours?
Add an option to keep the model running without an automatic purge from memory for advanced users (only manually purging the model or replacing with another model)

Technical Considerations

Should probably couple with some feedback mechanism and/or limits set to prevent users from overusing system resources
Should gracefully handle bad shutdowns

Questions for Maintainers

What is the right time range to allow for maximum flexibility?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions