-
Notifications
You must be signed in to change notification settings - Fork 231
Description
Describe the bug
When specifying to load a model (in this instance, Qwen3-1.7B) into the second or third GPUs (denoted GPU.1 and GPU.2) results in a segmentation fault and an instant crash. This was tested with the latest version of OVMS (built from source) AND with 2025.3 release. I believe these bugs to be OpenCL bugs. OVMS does in fact DETECT the other GPUs as shown here:
[2025-11-25 19:08:13.411][6495][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: GPU.1; plugin configuration
[2025-11-25 19:08:13.411][6495][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: GPU.1; plugin configuration: { ACTIVATIONS_SCALE_FACTOR: -1, AVAILABLE_DEVICES: 0 1 2, CACHE_DIR: , CACHE_ENCRYPTION_CALLBACKS: , CACHE_MODE: optimize_speed, COMPILATION_NUM_THREADS: 12, CONFIG_FILE: , DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.55.8, DEVICE_GOPS: {f16:0,f32:19660.8,i8:0,u8:0}, DEVICE_ID: 1, DEVICE_LUID: b0ff1aaefd7f0000, DEVICE_PCI_INFO: {domain: 0 bus: 10 device: 0x0 function: 0}, DEVICE_TYPE: discrete, DEVICE_UUID: 8680a056080000000a00000000000000, DYNAMIC_QUANTIZATION_GROUP_SIZE: 0, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Arc(TM) A770 Graphics (dGPU), GPU_DEVICE_ID: 0x56a0, GPU_DEVICE_TOTAL_MEM_SIZE: 16225243136, GPU_DISABLE_WINOGRAD_CONVOLUTION: NO, GPU_ENABLE_LOOP_UNROLLING: YES, GPU_ENABLE_LORA_OPERATION: YES, GPU_ENABLE_SDPA_OPTIMIZATION: YES, GPU_EXECUTION_UNITS_COUNT: 512, GPU_HOST_TASK_PRIORITY: MEDIUM, GPU_MEMORY_STATISTICS: {cl_mem:0,unknown:0,usm_device:0,usm_host:0,usm_shared:0}, GPU_QUEUE_PRIORITY: MEDIUM, GPU_QUEUE_THROTTLE: MEDIUM, GPU_UARCH_VERSION: 12.55.8, INFERENCE_PRECISION_HINT: f16, KV_CACHE_PRECISION: dynamic, MAX_BATCH_SIZE: 1, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NUM_STREAMS: 1, OPTIMAL_BATCH_SIZE: 1, OPTIMIZATION_CAPABILITIES: FP32 BIN FP16 INT8 GPU_HW_MATMUL GPU_USM_MEMORY EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 2 1, RANGE_FOR_STREAMS: 1 2, WEIGHTS_PATH: }
To Reproduce
Steps to reproduce the behavior:
- Build from source or use 2025.3 release of OVMS
- Convert any model using the export_model.py script.
- Launch OVMS as per documentation, by setting paths and then directing OVMS to a config.json file.
- See error
Expected behavior
The model SHOULD load.
Logs
[2025-11-25 19:08:13.462][6495][serving][debug][schema.cpp:566] Loading configuration from /media/models/config.json for: 1 time
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:824] Configuration file doesn't have monitoring property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:1059] Reading metric config only once per server start.
[2025-11-25 19:08:13.463][6495][serving][debug][mediapipegraphconfig.cpp:109] graph_path not defined in config so it will be set to default based on base_path and graph name: /media/models/Qwen/Qwen3-14B/graph.pbtxt
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:446] Graph: Qwen/Qwen3-14B path: /media/models/Qwen/Qwen3-14B/graph.pbtxt exists
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:862] Adding mediapipe graph config for Qwen/Qwen3-14B, /media/models/Qwen/Qwen3-14B/graph.pbtxt
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:940] Subconfig path: /media/models/Qwen/Qwen3-14B/subconfig.json provided for graph: Qwen/Qwen3-14B does not exist. Loading subconfig models will be skipped.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:971] Subconfiguration file doesn't have models property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:659] Configuration file doesn't have custom node libraries property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:704] Configuration file doesn't have pipelines property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:490] Mediapipe graph:Qwen/Qwen3-14B was not loaded so far. Triggering load
[2025-11-25 19:08:13.463][6495][modelmanager][debug][mediapipegraphdefinition.cpp:129] Started validation of mediapipe: Qwen/Qwen3-14B
[2025-11-25 19:08:13.464][6495][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2025-11-25 19:08:13.464][6495][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2025-11-25 19:08:13.464][6495][serving][info][mediapipegraphdefinition.cpp:421] MediapipeGraphDefinition initializing graph nodes
[2025-11-25 19:08:13.465][6495][modelmanager][info][servable_initializer.cpp:463] Initializing Language Model Continuous Batching servable
[2025-11-25 19:08:19.853][6495][serving][error][servable_initializer.cpp:145] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-14B/./ exception: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS
[2025-11-25 19:08:19.853][6495][modelmanager][error][servable_initializer.cpp:468] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2025-11-25 19:08:19.853][6495][serving][error][mediapipegraphdefinition.cpp:472] Failed to process LLM node graph Qwen/Qwen3-14B
[2025-11-25 19:08:19.853][6495][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: Qwen/Qwen3-14B state: BEGIN handling: ValidationFailedEvent:
[2025-11-25 19:08:19.853][6495][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: Qwen/Qwen3-14B state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent:
[2025-11-25 19:08:19.853][6495][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2025-11-25 19:08:19.853][6598][modelmanager][info][modelmanager.cpp:1201] Started model manager thread
[2025-11-25 19:08:19.853][6599][modelmanager][info][modelmanager.cpp:1220] Started cleaner thread
Bear in mind, these are the logs from 2025.3, however building from source just tells us that there was a segmentation fault (which likely is the same error).
Configuration
- OVMS version: source OR 2025.3
- OVMS config.json file:
{
"model_config_list": [
{
"config": {
"name": "Qwen/Qwen3-14B",
"base_path": "Qwen/Qwen3-14B"
}
}
]
} - CPU, accelerator's versions if applicable: 11600KF, 3 x Arc A770 on latest drivers.
- /media/models/Qwen/Qwen3-14B/~
- Qwen/Qwen3-14B
Additional context
This problem here is exactly why HETERO configs are failing, which I have made error reports about before. This needs to be resolved as being unable to access GPU.1 and GPU.2 is a massive oversight.
EDIT:
I have just run OpenCL benchmarks on all cards to check whether this is an OpenCL issue or whether the cards are genuinely detected; they all benchmarked perfectly and did not show errors. This is an OpenVINO issue.