-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Feature Description
Implement a pre-loading mechanism in TensorFusion that proactively downloads and caches AI model files and container images on GPU nodes before they are needed for inference workloads. This feature will integrate with the TensorFusion operator to intelligently distribute and manage model artifacts across the GPU cluster, ensuring optimal resource utilization and reduced cold-start times.
Motivation
AI models and inference container images are exceptionally large (often ranging from several GBs to hundreds of GBs), creating significant deployment bottlenecks in GPU clusters. Current challenges include:
- Extended cold-start times: New model deployments can take 10-30+ minutes just for image/model downloading
- Resource waste: GPU nodes remain idle during lengthy download processes, reducing overall cluster utilization
- Scaling delays: Horizontal pod autoscaling is severely impacted by download overhead
- Network congestion: Multiple concurrent downloads can saturate cluster networking
- Inconsistent performance: Unpredictable deployment times affect SLA compliance
- Cost implications: Idle GPU time during downloads represents significant operational costs
Acceptance Criteria
- Pre-loading Controller: Implement controller that monitors deployment patterns and proactively downloads artifacts based on configured preloading jobs, defined in TensorFusion custom resource
- Cache Management: Develop intelligent caching with configurable retention policies and LRU eviction
- Progress Tracking: Sync download progress and cached items status to GPUNode status, like what kubelet does for Kubernetes node info
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed