Skip to content

Compilation cache fails to persist on L40s/RTX 6000 #175

@Franklalalala

Description

@Franklalalala

I'm new to CUDA coding and I am encountering a critical issue with long compilation times, specifically during the inference stage.

The Issue:
Ideally, the model is supposed to compile once (e.g., at the start of training or the first run).
On V100 (Volta): This works perfectly. The CUDA_Cache folder is populated, and subsequent runs skip compilation, loading instantly from the cache.
On L40s / Pro 6000 (Newer Architectures): Even with CUDA_CACHE_PATH explicitly set, the cache is not saved or utilized. The model recompiles from scratch every time, resulting in significant overhead.

The Context:
I have manually set the environment variable CUDA_CACHE_PATH to the current directory to persist the compilation results.

My Suspicion:
It seems that on these advanced architectures, the standard caching mechanism is somehow bypassed. I suspect this might be because the codebase does not natively support these newer Compute Capabilities (e.g., sm_89), causing a fallback behavior that ignores the specified cache path or forces a rebuild.

Feature request:

Is there a specific configuration, flag, or architectural tag I need to add to enable persistent caching for these newer GPUs?

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions