Compilation cache fails to persist on L40s/RTX 6000

I'm new to CUDA coding and I am encountering a critical issue with long compilation times, specifically during the inference stage. 


**The Issue:**
Ideally, the model is supposed to compile once (e.g., at the start of training or the first run). 
On V100 (Volta): This works perfectly. The CUDA_Cache folder is populated, and subsequent runs skip compilation, loading instantly from the cache.
On L40s / Pro 6000 (Newer Architectures): Even with CUDA_CACHE_PATH explicitly set, the cache is not saved or utilized. The model recompiles from scratch every time, resulting in significant overhead.

**The Context:**
I have manually set the environment variable CUDA_CACHE_PATH to the current directory to persist the compilation results.


**My Suspicion:**
It seems that on these advanced architectures, the standard caching mechanism is somehow bypassed. I suspect this might be because the codebase does not natively support these newer Compute Capabilities (e.g., sm_89), causing a fallback behavior that ignores the specified cache path or forces a rebuild.

**Feature request:**

Is there a specific configuration, flag, or architectural tag I need to add to enable persistent caching for these newer GPUs?

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compilation cache fails to persist on L40s/RTX 6000 #175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compilation cache fails to persist on L40s/RTX 6000 #175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions