Skip to content

Remove user defined constructor from AllocatorHandleImpl#301

Merged
chillenzer merged 1 commit intoalpaka-group:devfrom
ikbuibui:patch-3
Mar 20, 2026
Merged

Remove user defined constructor from AllocatorHandleImpl#301
chillenzer merged 1 commit intoalpaka-group:devfrom
ikbuibui:patch-3

Conversation

@ikbuibui
Copy link
Copy Markdown
Contributor

@ikbuibui ikbuibui commented Mar 19, 2026

So this was a fun one. Really ended up getting to the point of a printf before and after every function call and being totally puzzled why execution just stops in the middle silently.

I had this piece of code,

    static constexpr int allocationMaxRetries = 13;

    // Raw allocation and retry logic using mallocMC
    [[nodiscard]] constexpr void* allocateRawMemory(
        auto const& worker,
        auto deviceHeapHandle,
        size_t size,
        std::align_val_t alignment)
    {
        for(int i = 0; i < allocationMaxRetries; ++i)
        {
            void* rawPtr = nullptr;
#if (BOOST_LANG_CUDA || BOOST_COMP_HIP)
            rawPtr = deviceHeapHandle.malloc(worker.getAcc(), size);
#else
            // Use nothrow to ensure nullptr is returned on failure,
            // preventing exceptions from breaking the retry loop.
            rawPtr = operator new(size, alignment, std::nothrow);
#endif
            if(rawPtr != nullptr)
            {
                return rawPtr;
            }
        }
        return nullptr;
    }

    // Allocates memory unintialized
    template<typename T>
    [[nodiscard]] constexpr T* allocateMemory(auto const& worker, auto deviceHeapHandle)
    {
        void* mem = allocateRawMemory(worker, deviceHeapHandle, sizeof(T), std::align_val_t{alignof(T)});
        if(mem)
        {
            return static_cast<T*>(mem);
        }
        return nullptr;
    }

It was called by one thread in a block as below

printf("a\n");
PtrType tmp = memory::allocateMemory<T>(worker, m_deviceHeapHandle);
printf("b\n");

This pointer was then stored in shared memory. And then later all threads used this pointer to access the allocated memory. What i saw was that I get segfaults.
Eventually with the printfs I noted that the main thread doing the allocation prints a but not b.
So my main thread execution was failing silently.
Switching allocateMemory and allocateRawMemory to accept the heap handle by reference solved the issue.

I guess what happens is the (implicitly defined) copy constructor of the heap handle is not marked as device. And this was crashing my main thread execution silently.
Removing the host only constructor fixes my issue and lets me pass by value.
Marking the constructor as constexpr also has the same effect. I guess because then the compiler generated copy constructor is also constexpr.

I decided just removing the user defined constructor is cleanest, but if you prefer you can opt for marking it as constexpr/host device as well.

@ikbuibui ikbuibui changed the title Remove explicit constructor from AllocatorHandleImpl Remove user defined constructor from AllocatorHandleImpl Mar 19, 2026
Copy link
Copy Markdown
Contributor

@chillenzer chillenzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that even be in there? Thanks for the fix but it looks like you had a lot of fun!

@chillenzer chillenzer merged commit a28c0d5 into alpaka-group:dev Mar 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants