Skip to content

Conversation

@jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Nov 7, 2024

Summary:
According to the white papers, the cache line size on NVPTX
architectures is 128 bytes. This should be what's returned by these
preprocessor macros.

Summary:
According to the white papers, the cache line size on NVPTX
architectures is 128 bytes. This should be what's returned by these
preprocessor macros.
@jhuber6 jhuber6 requested review from Artem-B and jlebar November 7, 2024 01:50
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 7, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2024

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
According to the white papers, the cache line size on NVPTX
architectures is 128 bytes. This should be what's returned by these
preprocessor macros.


Full diff: https://github.com/llvm/llvm-project/pull/115248.diff

1 Files Affected:

  • (modified) clang/lib/Basic/Targets/NVPTX.h (+4)
diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index 165b28a60fb2a9..b1a07cdc6ed6bc 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -186,6 +186,10 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
   bool hasBFloat16Type() const override { return true; }
 
   OffloadArch getGPU() const { return GPU; }
+
+  std::pair<unsigned, unsigned> hardwareInterferenceSizes() const override {
+    return std::make_pair(128, 128);
+  }
 };
 } // namespace targets
 } // namespace clang

@jhuber6 jhuber6 requested a review from AlexMaclean November 8, 2024 18:56
@jhuber6
Copy link
Contributor Author

jhuber6 commented Nov 13, 2024

ping

@cor3ntin
Copy link
Contributor

We probably want some test for this change

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 13, 2025

There's also the question of when done from CUDA, perhaps we should make this match the aux triple if present.

@Artem-B
Copy link
Member

Artem-B commented Jan 14, 2025

I'm not sure the host-side concepts of false sharing even applies to the GPU.
For starters, L1 and L2 caches are not necessarily coherent. Also, L1 cache provides very little temporal locality and is mostly used for coalescing smaller transactions from multiple threads in a warp, so the concept of false sharing is moot -- threads rarely get to touch the same cache line, again.

As such, I do not think the changes makes any difference in terms of performance of the generated code.

However, there's an argument to be made that the host does care about __GCC_DESTRUCTIVE_SIZE, and if NVPTX back-end overrides it for the GPU compilations, it may result in observable differences between host and device compilations. That's something we want to minimize.

So, unless there's a specific reason we need to set the values differently from the host, I'd prefer to keep them matching.

Right now it appears that neither NVPTX nor X86 override this parameter, so we happen to be in sync.
Considering that for a GPU it's likely a "don't care" parameter, I think we should derive it from the host, similarly to other properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants