Skip to content

Conversation

@JonChesterfield
Copy link
Collaborator

Noticed while extracting the smartstack as a test case

@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-offload

Author: Jon Chesterfield (JonChesterfield)

Changes

Noticed while extracting the smartstack as a test case


Full diff: https://github.com/llvm/llvm-project/pull/131918.diff

3 Files Affected:

  • (modified) offload/DeviceRTL/src/State.cpp (+4-4)
  • (modified) offload/include/Shared/Utils.h (+1-10)
  • (modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+1-1)
diff --git a/offload/DeviceRTL/src/State.cpp b/offload/DeviceRTL/src/State.cpp
index cbe9735145340..62b03e7bba720 100644
--- a/offload/DeviceRTL/src/State.cpp
+++ b/offload/DeviceRTL/src/State.cpp
@@ -89,8 +89,8 @@ struct SharedMemorySmartStackTy {
   /// Compute the size of the storage space reserved for a thread.
   uint32_t computeThreadStorageTotal() {
     uint32_t NumLanesInBlock = mapping::getNumberOfThreadsInBlock();
-    return utils::alignDown((state::SharedScratchpadSize / NumLanesInBlock),
-                            allocator::ALIGNMENT);
+    return __builtin_align_down(state::SharedScratchpadSize / NumLanesInBlock,
+                                allocator::ALIGNMENT);
   }
 
   /// Return the top address of the warp data stack, that is the first address
@@ -121,7 +121,7 @@ void *SharedMemorySmartStackTy::push(uint64_t Bytes) {
   // First align the number of requested bytes.
   /// FIXME: The stack shouldn't require worst-case padding. Alignment needs to
   /// be passed in as an argument and the stack rewritten to support it.
-  uint64_t AlignedBytes = utils::alignPtr(Bytes, allocator::ALIGNMENT);
+  uint64_t AlignedBytes = __builtin_align_up(Bytes, allocator::ALIGNMENT);
 
   uint32_t StorageTotal = computeThreadStorageTotal();
 
@@ -149,7 +149,7 @@ void *SharedMemorySmartStackTy::push(uint64_t Bytes) {
 }
 
 void SharedMemorySmartStackTy::pop(void *Ptr, uint64_t Bytes) {
-  uint64_t AlignedBytes = utils::alignPtr(Bytes, allocator::ALIGNMENT);
+  uint64_t AlignedBytes = __builtin_align_up(Bytes, allocator::ALIGNMENT);
   if (utils::isSharedMemPtr(Ptr)) {
     int TId = mapping::getThreadIdInBlock();
     Usage[TId] -= AlignedBytes;
diff --git a/offload/include/Shared/Utils.h b/offload/include/Shared/Utils.h
index 523e6bc505b81..e3d7b002e1b44 100644
--- a/offload/include/Shared/Utils.h
+++ b/offload/include/Shared/Utils.h
@@ -30,18 +30,9 @@ template <typename Ty1, typename Ty2> Ty1 *advancePtr(Ty1 *Ptr, Ty2 Offset) {
   return (Ty1 *)(const_cast<char *>((const char *)(Ptr)) + Offset);
 }
 
-/// Return \p V aligned "upwards" according to \p Align.
-template <typename Ty1, typename Ty2> inline Ty1 alignPtr(Ty1 V, Ty2 Align) {
-  return reinterpret_cast<Ty1>(((uintptr_t(V) + Align - 1) / Align) * Align);
-}
-/// Return \p V aligned "downwards" according to \p Align.
-template <typename Ty1, typename Ty2> inline Ty1 alignDown(Ty1 V, Ty2 Align) {
-  return V - V % Align;
-}
-
 /// Round up \p V to a \p Boundary.
 template <typename Ty> inline Ty roundUp(Ty V, Ty Boundary) {
-  return alignPtr(V, Boundary);
+  return __builtin_align_up(V, Boundary);
 }
 
 /// Return the first bit set in \p V.
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index 4d2ebcbc7be8e..67c8c900ac66f 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -78,7 +78,7 @@ struct RecordReplayTy {
         Device->allocate(1024, /*HstPtr=*/nullptr, TARGET_ALLOC_DEFAULT);
     Device->free(Addr);
     // Align Address to MaxMemoryAllocation
-    Addr = (void *)utils::alignPtr((Addr), MaxMemoryAllocation);
+    Addr = __builtin_align_up(Addr, MaxMemoryAllocation);
     return Addr;
   }
 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is compiled with as low as GCC 7.3 I believe, though we are floating around making the projects build deprecated. Does GCC support this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google says it is 10. Too high.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's sad. Oh well, can still drop the utils from State.cpp. Reduced the scope of the patch accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can revisit once we make the projects build illegal.

@JonChesterfield JonChesterfield force-pushed the jc_openmp_align_builtin branch from 8683428 to 0cea76e Compare March 18, 2025 21:29
@JonChesterfield JonChesterfield changed the title [openmp][nfc] Use builtin align instead of handcoding utils [openmp][nfc] Use builtin align in the devicertl Mar 18, 2025
@JonChesterfield JonChesterfield merged commit deb0f3c into llvm:main Mar 18, 2025
7 of 9 checks passed
@JonChesterfield JonChesterfield deleted the jc_openmp_align_builtin branch March 18, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants