OVMS refuses to load models into second or third GPUs.

**Describe the bug**
When specifying to load a model (in this instance, Qwen3-1.7B) into the second or third GPUs (denoted GPU.1 and GPU.2) results in a segmentation fault and an instant crash. This was tested with the latest version of OVMS (built from source) AND with 2025.3 release. I believe these bugs to be OpenCL bugs. OVMS does in fact DETECT the other GPUs as shown here:
[2025-11-25 19:08:13.411][6495][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: GPU.1; plugin configuration
[2025-11-25 19:08:13.411][6495][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: GPU.1; plugin configuration: { ACTIVATIONS_SCALE_FACTOR: -1, AVAILABLE_DEVICES: 0 1 2, CACHE_DIR: , CACHE_ENCRYPTION_CALLBACKS: , CACHE_MODE: optimize_speed, COMPILATION_NUM_THREADS: 12, CONFIG_FILE: , DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.55.8, DEVICE_GOPS: {f16:0,f32:19660.8,i8:0,u8:0}, DEVICE_ID: 1, DEVICE_LUID: b0ff1aaefd7f0000, DEVICE_PCI_INFO: {domain: 0 bus: 10 device: 0x0 function: 0}, DEVICE_TYPE: discrete, DEVICE_UUID: 8680a056080000000a00000000000000, DYNAMIC_QUANTIZATION_GROUP_SIZE: 0, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Arc(TM) A770 Graphics (dGPU), GPU_DEVICE_ID: 0x56a0, GPU_DEVICE_TOTAL_MEM_SIZE: 16225243136, GPU_DISABLE_WINOGRAD_CONVOLUTION: NO, GPU_ENABLE_LOOP_UNROLLING: YES, GPU_ENABLE_LORA_OPERATION: YES, GPU_ENABLE_SDPA_OPTIMIZATION: YES, GPU_EXECUTION_UNITS_COUNT: 512, GPU_HOST_TASK_PRIORITY: MEDIUM, GPU_MEMORY_STATISTICS: {cl_mem:0,unknown:0,usm_device:0,usm_host:0,usm_shared:0}, GPU_QUEUE_PRIORITY: MEDIUM, GPU_QUEUE_THROTTLE: MEDIUM, GPU_UARCH_VERSION: 12.55.8, INFERENCE_PRECISION_HINT: f16, KV_CACHE_PRECISION: dynamic, MAX_BATCH_SIZE: 1, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NUM_STREAMS: 1, OPTIMAL_BATCH_SIZE: 1, OPTIMIZATION_CAPABILITIES: FP32 BIN FP16 INT8 GPU_HW_MATMUL GPU_USM_MEMORY EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 2 1, RANGE_FOR_STREAMS: 1 2, WEIGHTS_PATH:  }

**To Reproduce**
Steps to reproduce the behavior:
1. Build from source or use 2025.3 release of OVMS
2. Convert any model using the export_model.py script.
3. Launch OVMS as per documentation, by setting paths and then directing OVMS to a config.json file.
4. See error
**Expected behavior**
The model SHOULD load.

**Logs**
[2025-11-25 19:08:13.462][6495][serving][debug][schema.cpp:566] Loading configuration from /media/models/config.json for: 1 time
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:824] Configuration file doesn't have monitoring property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:1059] Reading metric config only once per server start.
[2025-11-25 19:08:13.463][6495][serving][debug][mediapipegraphconfig.cpp:109] graph_path not defined in config so it will be set to default based on base_path and graph name: /media/models/Qwen/Qwen3-14B/graph.pbtxt
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:446] Graph: Qwen/Qwen3-14B path: /media/models/Qwen/Qwen3-14B/graph.pbtxt exists
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:862] Adding mediapipe graph config for Qwen/Qwen3-14B, /media/models/Qwen/Qwen3-14B/graph.pbtxt
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:940] Subconfig path: /media/models/Qwen/Qwen3-14B/subconfig.json provided for graph: Qwen/Qwen3-14B does not exist. Loading subconfig models will be skipped.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:971] Subconfiguration file doesn't have models property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:659] Configuration file doesn't have custom node libraries property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:704] Configuration file doesn't have pipelines property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:490] Mediapipe graph:Qwen/Qwen3-14B was not loaded so far. Triggering load
[2025-11-25 19:08:13.463][6495][modelmanager][debug][mediapipegraphdefinition.cpp:129] Started validation of mediapipe: Qwen/Qwen3-14B
[2025-11-25 19:08:13.464][6495][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2025-11-25 19:08:13.464][6495][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2025-11-25 19:08:13.464][6495][serving][info][mediapipegraphdefinition.cpp:421] MediapipeGraphDefinition initializing graph nodes
[2025-11-25 19:08:13.465][6495][modelmanager][info][servable_initializer.cpp:463] Initializing Language Model Continuous Batching servable
[2025-11-25 19:08:19.853][6495][serving][error][servable_initializer.cpp:145] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-14B/./ exception: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS




[2025-11-25 19:08:19.853][6495][modelmanager][error][servable_initializer.cpp:468] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2025-11-25 19:08:19.853][6495][serving][error][mediapipegraphdefinition.cpp:472] Failed to process LLM node graph Qwen/Qwen3-14B
[2025-11-25 19:08:19.853][6495][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: Qwen/Qwen3-14B state: BEGIN handling: ValidationFailedEvent: 
[2025-11-25 19:08:19.853][6495][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: Qwen/Qwen3-14B state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent: 
[2025-11-25 19:08:19.853][6495][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2025-11-25 19:08:19.853][6598][modelmanager][info][modelmanager.cpp:1201] Started model manager thread
[2025-11-25 19:08:19.853][6599][modelmanager][info][modelmanager.cpp:1220] Started cleaner thread

Bear in mind, these are the logs from 2025.3, however building from source just tells us that there was a segmentation fault (which likely is the same error).

**Configuration**
1. OVMS version: source OR 2025.3
2. OVMS config.json file:
{
    "model_config_list": [
        {
            "config": {
                "name": "Qwen/Qwen3-14B",
                "base_path": "Qwen/Qwen3-14B"
            }
        }
    ]
}
3. CPU, accelerator's versions if applicable: 11600KF, 3 x Arc A770 on latest drivers.
4. /media/models/Qwen/Qwen3-14B/~
5. Qwen/Qwen3-14B

**Additional context**
This problem here is exactly why HETERO configs are failing, which I have made error reports about before. This needs to be resolved as being unable to access GPU.1 and GPU.2 is a massive oversight.

EDIT:
I have just run OpenCL benchmarks on all cards to check whether this is an OpenCL issue or whether the cards are genuinely detected; they all benchmarked perfectly and did not show errors. This is an OpenVINO issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OVMS refuses to load models into second or third GPUs. #3816

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OVMS refuses to load models into second or third GPUs. #3816

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions