-
Notifications
You must be signed in to change notification settings - Fork 253
Description
I am trying to use llama.cpp with SYCL and when running with default settings I'm getting a "Bus error" (SIGBUS) when loading models:
$ ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf
WARNING: Small BAR detected for device 0000:03:00.0
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
Bus error (core dumped) ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf
That is using the level_zero
device by default. When using the OpenCL version using ONEAPI_DEVICE_SELECTOR
the code works fine:
$ ONEAPI_DEVICE_SELECTOR=opencl:gpu ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf
WARNING: Small BAR detected for device 0000:03:00.0
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| llama 13B Q3_K - Medium | 6.69 GiB | 14.66 B | SYCL | 99 | pp512 | 333.50 ± 20.98 |
...
I'm aware of the "small bar" warning as I'm running on older hardware (Asus Z170-A
+ Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
) with Arc A750:
$ sycl-ls
WARNING: Small BAR detected for device 0000:03:00.0
WARNING: Small BAR detected for device 0000:03:00.0
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.6.34666]
[opencl:fpga][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.18.12.0.05_160000]
[opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A750 Graphics OpenCL 3.0 NEO [25.31.34666]
I'm using Arch Linux and the latest version of intel-compute-runtime
I could find:
$ uname -a
Linux hostname 6.16.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 15 Aug 2025 16:04:43 +0000 x86_64 GNU/Linux
$ pacman -Q intel-compute-runtime
intel-compute-runtime 25.31.34666.3-1
Some more crash details with GDB:
$ gdb --args ./bin/llama-bench -m models/phi-4-Q3_K_M.gguf
...
Thread 1 "llama-bench" received signal SIGBUS, Bus error.
(gdb) bt full
#0 0x00007ffff636e087 in ?? () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x00007fffdcab128c in memcpy_s (dst=0x7ffcfaa60000, destSize=<optimized out>, src=0x5d556b0, count=<optimized out>)
at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/shared/source/helpers/string.h:71
No locals.
#2 L0::CommandListCoreFamilyImmediate<(GFXCORE_FAMILY)3079>::performCpuMemcpy (this=this@entry=0x5d5a6c0, cpuMemCopyInfo=...,
hSignalEvent=hSignalEvent@entry=0x2cd9e18, numWaitEvents=numWaitEvents@entry=0, phWaitEvents=phWaitEvents@entry=0x0)
at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/level_zero/core/source/cmdlist/cmdlist_hw_immediate.inl:1444
lockingFailed = false
srcLockPointer = <optimized out>
dstLockPointer = <optimized out>
signalEvent = 0x2cd9e10
cpuMemcpySrcPtr = 0x5d556b0
cpuMemcpyDstPtr = 0x7ffcfaa60000
#3 0x00007fffdcabf240 in L0::CommandListCoreFamilyImmediate<(GFXCORE_FAMILY)3079>::appendMemoryCopy (this=0x5d5a6c0, dstptr=0xffffd556aaa00000,
srcptr=0x5d556b0, size=20480, hSignalEvent=0x2cd9e18, numWaitEvents=0, phWaitEvents=0x0, memoryCopyParams=...)
at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/level_zero/core/source/cmdlist/cmdlist_hw_immediate.inl:683
estimatedSize = <optimized out>
hasStallindCmds = false
ret = <optimized out>
cpuMemCopyInfo = {dstPtr = 0xffffd556aaa00000, srcPtr = 0x5d556b0, size = 20480, dstAllocData = 0x5c78bc0, srcAllocData = 0x0,
dstIsImportedHostPtr = false, srcIsImportedHostPtr = false}
direction = 32767
isSplitNeeded = <optimized out>
#4 0x00007fffdc8e35f2 in L0::zeCommandListAppendMemoryCopy (hCommandList=<optimized out>, dstptr=<optimized out>, srcptr=<optimized out>, size=20480,
hSignalEvent=0x2cd9e18, numWaitEvents=<optimized out>, phWaitEvents=0x0)
at /src/arch/intel-compute-runtime/src/compute-runtime-25.31.34666.3/level_zero/api/core/ze_copy_api_entrypoints.h:32
cmdList = 0x5d5a6c0
ret = ZE_RESULT_ERROR_NOT_AVAILABLE
memoryCopyParams = {relaxedOrderingDispatch = false, forceDisableCopyOnlyInOrderSignaling = false, copyOffloadAllowed = false}
#5 0x00007fffee78237b in enqueueMemCopyHelper(ur_command_t, ur_queue_handle_legacy_t_*, void*, unsigned char, unsigned long, void const*, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**, bool) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_level_zero.so.0
No symbol table info available.
#6 0x00007fffee78bf87 in ur_queue_handle_legacy_t_::enqueueUSMMemcpy(bool, void*, void const*, unsigned long, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_level_zero.so.0
No symbol table info available.
#7 0x00007fffe0cf4db7 in ur_loader::urEnqueueUSMMemcpy(ur_queue_handle_t_*, bool, void*, void const*, unsigned long, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
No symbol table info available.
#8 0x00007fffe0d07eff in urEnqueueUSMMemcpy () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
No symbol table info available.
#9 0x00007fffe1c45aa9 in sycl::_V1::detail::MemoryManager::copy_usm(void const*, std::shared_ptr<sycl::_V1::detail::queue_impl>, unsigned long, void*, std::vector<ur_event_handle_t_*, std::allocator<ur_event_handle_t_*> >, ur_event_handle_t_**, std::shared_ptr<sycl::_V1::detail::event_impl> const&) ()
from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
No symbol table info available.
#10 0x00007fffe1c8b83d in sycl::_V1::detail::queue_impl::memcpy(std::shared_ptr<sycl::_V1::detail::queue_impl> const&, void*, void const*, unsigned long, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&, bool, sycl::_V1::detail::code_location const&) ()
from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
No symbol table info available.
#11 0x00007fffe1d36421 in sycl::_V1::queue::memcpy(void*, void const*, unsigned long, sycl::_V1::detail::code_location const&) ()
from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
No symbol table info available.
#12 0x00007ffff6a37f9d in ggml_backend_sycl_buffer_set_tensor(ggml_backend_buffer*, ggml_tensor*, void const*, unsigned long, unsigned long)::{lambda()#2}::operator()() const (this=<optimized out>) at /src/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:399
e = <optimized out>
(gdb) list
...
1444 memcpy_s(cpuMemcpyDstPtr, cpuMemCopyInfo.size, cpuMemcpySrcPtr, cpuMemCopyInfo.size);
...
(gdb) p cpuMemCopyInfo
$7 = (const L0::CpuMemCopyInfo &) @0x7fffffffaea0: {dstPtr = 0xffffd556aaa00000, srcPtr = 0x5d556b0, size = 20480, dstAllocData = 0x5c78bc0,
srcAllocData = 0x0, dstIsImportedHostPtr = false, srcIsImportedHostPtr = false}
(gdb) p cpuMemcpyDstPtr
$8 = (void *) 0x7ffcfaa60000
(gdb) info proc mappings
Mapped address spaces:
Start Addr End Addr Size Offset Perms File
...
0x00000000004d5000 0x0000000005d6a000 0x5895000 0x0 rw-p [heap]
0x00007ffcfaa60000 0x00007ffdf9000000 0xfe5a0000 0x1e929a000 rw-s anon_inode:i915.gem
0x00007ffdf9000000 0x00007fffa59b7000 0x1ac9b7000 0x0 r--s /data/llama-models/phi-4-Q3_K_M.gguf
0x00007fffa5a00000 0x00007fffa5a3f000 0x3f000 0x0 r--p /usr/lib/libopencl-clang.so.15
...
While I understand the issue might be due to the "small BAR" error, I would appreciate a helpful error message rather than a SIGBUS that requires rebuilding the intel-compute-runtime
with debug symbols to understand where the issue is coming from. Even better - make level_zero
work with small BAR, even if with reduced performance.