Skip to content

Hidden version requirements to liblikwidΒ #56

@jonas-schulze

Description

@jonas-schulze

I just saw your talk from JuliaCon 2023 that also mentioned LIKWID.jl and I was eager to try it out. πŸš€

Unfortunately, trying the very first tutorial, I got a segfault. After some digging, I found the culprit: the definitions of CpuTopology between LIKWID.jl and the underlying liblikwid differed. My likwid.h version 5.1 (installed via apt) defines the CpuTopology struct without the numDies field, while LIKWID.jl version 0.4.4 defines its CpuTopology struct having numDies in position 4. Consequently, the fields are shifted when transferring the data from C to Julia: Julia's numDies contains C's numCoresPerSocket, ..., numCacheLevels contains the value of threadPool, which is a pointer. Wrapping cacheLevels (C's topologyTree) into an array of length numCacheLevels (C's threadPool) leads to the malformed memory accesses. πŸ’₯

likwid.h: https://github.com/RRZE-HPC/likwid/blob/v5.1/src/includes/likwid.h#L370-L380

typedef struct {
    uint32_t numHWThreads; /*!< \brief Amount of active HW threads in the system (e.g. in cpuset) */
    uint32_t activeHWThreads; /*!< \brief Amount of HW threads in the system and length of \a threadPool */
    uint32_t numSockets; /*!< \brief Amount of CPU sockets/packages in the system */
    uint32_t numCoresPerSocket; /*!< \brief Amount of physical cores in one CPU socket/package */
    uint32_t numThreadsPerCore; /*!< \brief Amount of HW threads in one physical CPU core */
    uint32_t numCacheLevels; /*!< \brief Amount of caches for each HW thread and length of \a cacheLevels */
    HWThread* threadPool; /*!< \brief List of all HW thread descriptions */
    CacheLevel*  cacheLevels; /*!< \brief List of all caches in the hierarchy */
    struct treeNode* topologyTree; /*!< \brief Anchor for a tree structure describing the system topology */
} CpuTopology;

Liblikwid.jl: https://github.com/JuliaPerf/LIKWID.jl/blob/v0.4.4/src/LibLikwid.jl#L640-L651

struct CpuTopology
    numHWThreads::UInt32
    activeHWThreads::UInt32
    numSockets::UInt32
    numDies::UInt32
    numCoresPerSocket::UInt32
    numThreadsPerCore::UInt32
    numCacheLevels::UInt32
    threadPool::Ptr{HWThread}
    cacheLevels::Ptr{CacheLevel}
    topologyTree::Ptr{treeNode}
end

The numDies field has been added in RRZE-HPC/likwid@a0ac14d, which according to GitHub shipped in likwid version 5.2 and onwards. Moving the numDies field to the end of the C struct is not an option, I guess, and removing numDies from LIKWID.jl as its not used anyways. Therefore, I would recommend:

  • Document the version requirements for liblikwid
  • Let LIKWID.jl fail gracefully, if it recognizes an unsupported version of liblikwid
  • Let LIKWID.jl bundle its own liblikwid

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions