-
Notifications
You must be signed in to change notification settings - Fork 344
Open
Labels
Description
Hello.
I have a problem with device HETERO:GPU.0,GPU.1 and MODEL_DISTRIBUTION_POLICY="PIPELINE_PARALLEL".
Please, I ask for your help.
Windows 11
Driver 8331
2x A770 (16gb)
It's not working with https://huggingface.co/savvadesogle/T-pro-it-2.1-int4-ov
Based on the Qwen/Qwen3-32B architecture
I specifically chose a model that doesn't fit into a single GPU.
❌ TEST-HETERO - NOT WORKING (the error is not shown on 2025.4.1, but on 2026-dev it is shown)
import openvino_genai as ov_genai
model_path = "C:\\llm\\models\\ov\\T-pro-it-2.1-int4-ov"
device = "HETERO:GPU.0,GPU.1"
print('Device selected:', device)
pipe = ov_genai.LLMPipeline(
model_path,
device,
MODEL_DISTRIBUTION_POLICY="PIPELINE_PARALLEL"
)
print('Pipeline initialized')
print(pipe.generate("How to make a tea?", max_new_tokens=100))
print('END')
✅ TEST-SINGLE-GPU - WORKING
import openvino_genai as ov_genai
model_path = "C:\\llm\\models\\ov\\T-pro-it-2.1-int4-ov"
device = "GPU"
print('Device selected:', device)
pipe = ov_genai.LLMPipeline(
model_path,
device
)
print('Pipeline initialized')
print(pipe.generate("How to make a tea?", max_new_tokens=100))
print('END')
metrics={'load_time (s)': 82.02, 'ttft (s)': 0.54, 'tpot (ms)': 161.09088, 'prefill_throughput (tokens/s)': 940.13, 'decode_throughput (tokens/s)': 6.20768, 'decode_duration (s)': 13.27085, 'input_token': 512, 'new_token': 80, 'total_token': 592, 'stream': False}
PIP LIST (the error is not shown in console, just exit)
(genai) c:\openvino\genai\openvino.genai>uv pip list
Using Python 3.12.12 environment at: C:\Users\uuk\miniconda3\envs\genai
Package Version
------------------- ----------
numpy 2.3.5
openvino 2025.4.1
openvino-genai 2025.4.1.0
openvino-telemetry 2025.2.0
openvino-tokenizers 2025.4.1.0
packaging 25.0
pip 25.3
setuptools 80.9.0
wheel 0.45.1
PIP LIST (the error is shown in console, and then exit)
(genai) c:\openvino\genai\openvino.genai>uv pip list
Using Python 3.12.12 environment at: C:\Users\uuk\miniconda3\envs\genai
Package Version
------------------- ----------------------
numpy 2.3.5
openvino 2026.0.0.dev20260102
openvino-genai 2026.0.0.0.dev20260102
openvino-telemetry 2025.2.0
openvino-tokenizers 2026.0.0.0.dev20260102
packaging 25.0
pip 25.3
setuptools 80.9.0
wheel 0.45.1
(genai) c:\openvino\genai\openvino.genai>python test-hetero.py
Device selected: HETERO:GPU.0,GPU.1
Traceback (most recent call last):
File "c:\openvino\genai\openvino.genai\test-hetero.py", line 5, in <module>
pipe = ov_genai.LLMPipeline(
^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src\inference\src\cpp\core.cpp:119:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\hetero\src\compiled_model.cpp:36:
Standard exception from compilation library: Exception from src\inference\src\dev\plugin.cpp:53:
Check 'false' failed at src\plugins\intel_gpu\src\plugin\program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src\plugins\intel_gpu\src\runtime\ocl\ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS
Reactions are currently unavailable