Skip to content

OVMS refuses to load models into second or third GPUs. #3816

@HumerousGorgon

Description

@HumerousGorgon

Describe the bug
When specifying to load a model (in this instance, Qwen3-1.7B) into the second or third GPUs (denoted GPU.1 and GPU.2) results in a segmentation fault and an instant crash. This was tested with the latest version of OVMS (built from source) AND with 2025.3 release. I believe these bugs to be OpenCL bugs. OVMS does in fact DETECT the other GPUs as shown here:
[2025-11-25 19:08:13.411][6495][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: GPU.1; plugin configuration
[2025-11-25 19:08:13.411][6495][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: GPU.1; plugin configuration: { ACTIVATIONS_SCALE_FACTOR: -1, AVAILABLE_DEVICES: 0 1 2, CACHE_DIR: , CACHE_ENCRYPTION_CALLBACKS: , CACHE_MODE: optimize_speed, COMPILATION_NUM_THREADS: 12, CONFIG_FILE: , DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.55.8, DEVICE_GOPS: {f16:0,f32:19660.8,i8:0,u8:0}, DEVICE_ID: 1, DEVICE_LUID: b0ff1aaefd7f0000, DEVICE_PCI_INFO: {domain: 0 bus: 10 device: 0x0 function: 0}, DEVICE_TYPE: discrete, DEVICE_UUID: 8680a056080000000a00000000000000, DYNAMIC_QUANTIZATION_GROUP_SIZE: 0, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Arc(TM) A770 Graphics (dGPU), GPU_DEVICE_ID: 0x56a0, GPU_DEVICE_TOTAL_MEM_SIZE: 16225243136, GPU_DISABLE_WINOGRAD_CONVOLUTION: NO, GPU_ENABLE_LOOP_UNROLLING: YES, GPU_ENABLE_LORA_OPERATION: YES, GPU_ENABLE_SDPA_OPTIMIZATION: YES, GPU_EXECUTION_UNITS_COUNT: 512, GPU_HOST_TASK_PRIORITY: MEDIUM, GPU_MEMORY_STATISTICS: {cl_mem:0,unknown:0,usm_device:0,usm_host:0,usm_shared:0}, GPU_QUEUE_PRIORITY: MEDIUM, GPU_QUEUE_THROTTLE: MEDIUM, GPU_UARCH_VERSION: 12.55.8, INFERENCE_PRECISION_HINT: f16, KV_CACHE_PRECISION: dynamic, MAX_BATCH_SIZE: 1, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0, NUM_STREAMS: 1, OPTIMAL_BATCH_SIZE: 1, OPTIMIZATION_CAPABILITIES: FP32 BIN FP16 INT8 GPU_HW_MATMUL GPU_USM_MEMORY EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 2 1, RANGE_FOR_STREAMS: 1 2, WEIGHTS_PATH: }

To Reproduce
Steps to reproduce the behavior:

  1. Build from source or use 2025.3 release of OVMS
  2. Convert any model using the export_model.py script.
  3. Launch OVMS as per documentation, by setting paths and then directing OVMS to a config.json file.
  4. See error
    Expected behavior
    The model SHOULD load.

Logs
[2025-11-25 19:08:13.462][6495][serving][debug][schema.cpp:566] Loading configuration from /media/models/config.json for: 1 time
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:824] Configuration file doesn't have monitoring property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:1059] Reading metric config only once per server start.
[2025-11-25 19:08:13.463][6495][serving][debug][mediapipegraphconfig.cpp:109] graph_path not defined in config so it will be set to default based on base_path and graph name: /media/models/Qwen/Qwen3-14B/graph.pbtxt
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:446] Graph: Qwen/Qwen3-14B path: /media/models/Qwen/Qwen3-14B/graph.pbtxt exists
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:862] Adding mediapipe graph config for Qwen/Qwen3-14B, /media/models/Qwen/Qwen3-14B/graph.pbtxt
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:940] Subconfig path: /media/models/Qwen/Qwen3-14B/subconfig.json provided for graph: Qwen/Qwen3-14B does not exist. Loading subconfig models will be skipped.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:971] Subconfiguration file doesn't have models property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:659] Configuration file doesn't have custom node libraries property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:704] Configuration file doesn't have pipelines property.
[2025-11-25 19:08:13.463][6495][modelmanager][debug][modelmanager.cpp:490] Mediapipe graph:Qwen/Qwen3-14B was not loaded so far. Triggering load
[2025-11-25 19:08:13.463][6495][modelmanager][debug][mediapipegraphdefinition.cpp:129] Started validation of mediapipe: Qwen/Qwen3-14B
[2025-11-25 19:08:13.464][6495][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2025-11-25 19:08:13.464][6495][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2025-11-25 19:08:13.464][6495][serving][info][mediapipegraphdefinition.cpp:421] MediapipeGraphDefinition initializing graph nodes
[2025-11-25 19:08:13.465][6495][modelmanager][info][servable_initializer.cpp:463] Initializing Language Model Continuous Batching servable
[2025-11-25 19:08:19.853][6495][serving][error][servable_initializer.cpp:145] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-14B/./ exception: Exception from src/inference/src/cpp/core.cpp:109:
Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS

[2025-11-25 19:08:19.853][6495][modelmanager][error][servable_initializer.cpp:468] Error during LLM node resources initialization: The LLM Node resource initialization failed
[2025-11-25 19:08:19.853][6495][serving][error][mediapipegraphdefinition.cpp:472] Failed to process LLM node graph Qwen/Qwen3-14B
[2025-11-25 19:08:19.853][6495][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: Qwen/Qwen3-14B state: BEGIN handling: ValidationFailedEvent:
[2025-11-25 19:08:19.853][6495][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: Qwen/Qwen3-14B state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent:
[2025-11-25 19:08:19.853][6495][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2025-11-25 19:08:19.853][6598][modelmanager][info][modelmanager.cpp:1201] Started model manager thread
[2025-11-25 19:08:19.853][6599][modelmanager][info][modelmanager.cpp:1220] Started cleaner thread

Bear in mind, these are the logs from 2025.3, however building from source just tells us that there was a segmentation fault (which likely is the same error).

Configuration

  1. OVMS version: source OR 2025.3
  2. OVMS config.json file:
    {
    "model_config_list": [
    {
    "config": {
    "name": "Qwen/Qwen3-14B",
    "base_path": "Qwen/Qwen3-14B"
    }
    }
    ]
    }
  3. CPU, accelerator's versions if applicable: 11600KF, 3 x Arc A770 on latest drivers.
  4. /media/models/Qwen/Qwen3-14B/~
  5. Qwen/Qwen3-14B

Additional context
This problem here is exactly why HETERO configs are failing, which I have made error reports about before. This needs to be resolved as being unable to access GPU.1 and GPU.2 is a massive oversight.

EDIT:
I have just run OpenCL benchmarks on all cards to check whether this is an OpenCL issue or whether the cards are genuinely detected; they all benchmarked perfectly and did not show errors. This is an OpenVINO issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions