-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I'm new to CUDA coding and I am encountering a critical issue with long compilation times, specifically during the inference stage.
The Issue:
Ideally, the model is supposed to compile once (e.g., at the start of training or the first run).
On V100 (Volta): This works perfectly. The CUDA_Cache folder is populated, and subsequent runs skip compilation, loading instantly from the cache.
On L40s / Pro 6000 (Newer Architectures): Even with CUDA_CACHE_PATH explicitly set, the cache is not saved or utilized. The model recompiles from scratch every time, resulting in significant overhead.
The Context:
I have manually set the environment variable CUDA_CACHE_PATH to the current directory to persist the compilation results.
My Suspicion:
It seems that on these advanced architectures, the standard caching mechanism is somehow bypassed. I suspect this might be because the codebase does not natively support these newer Compute Capabilities (e.g., sm_89), causing a fallback behavior that ignores the specified cache path or forces a rebuild.
Feature request:
Is there a specific configuration, flag, or architectural tag I need to add to enable persistent caching for these newer GPUs?
Thanks for your help!