[GpuOclRuntime] Avoid error when iterating over non-gpu devices #413

akroviakov · 2024-11-26T13:46:53Z

Consider these devices visible to OpenCL (CPU and GPU):

Platform: Intel(R) OpenCL
  Device: Intel(R) Xeon(R) Gold 6438Y+
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Data Center GPU Max 1100

If you try to run the mlp.mlir example in the current main, it would display an error (-1) for CPU, but would still run the test. The error is displayed due to reading all devices using CL_DEVICE_TYPE_GPU. Instead, we can first read all devices using CL_DEVICE_TYPE_ALL and then use CL_DEVICE_TYPE to select GPUs, this way, no error is displayed.

…u runner

AndreyPavlenko · 2024-11-26T18:04:30Z

Does it mean, that clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, nullptr, &numDevices) returns error? Could it be an environment configuration issue?
Btw, mlp.mlir is running by CI and it does not fail.

AndreyPavlenko · 2024-11-26T18:08:07Z

Sorry, I closed it accidently.

kurapov-peter · 2024-11-27T14:03:37Z

What's the use case for running CPU through opencl anyway? The runtime is not supposed to be called during CPU execution.

akroviakov · 2024-11-27T14:20:10Z

I build like this:

git clone ...
./scripts/compile.sh --dev --imex
cd build
cmake --build . --target gc-check

then I run mlp.mlir like this:

/home/.../graph-compiler/build/bin/gc-gpu-runner --shared-libs=/home/.../graph-compiler/externals/llvm-project/build/lib/libmlir_runner_utils.so /home/.../graph-compiler/test/mlir/test/gc/gpu-runner/mlp.mlir

Assuming I am using gpu-runner in a way I am not supposed to, what would be the correct command to run a single arbitrary .mlir test for GPU then?

AndreyPavlenko · 2024-11-27T15:32:55Z

Assuming I am using gpu-runner in a way I am not supposed to, what would be the correct command to run a single arbitrary .mlir test for GPU then?

You are using it in the right way and the same command works for me:

$ ./bin/gc-gpu-runner --shared-libs=../../../graph-compiler/externals/llvm-project/build/Release/lib/libmlir_runner_utils.so ../../../graph-compiler/test/mlir/test/gc/gpu-runner/mlp.mlir
Unranked Memref base@ = 0x55c191cd5180 rank = 2 offset = 0 sizes = [1, 10] strides = [10, 1] data =
[[0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1]]

Does this command fails for you? What's the failure?

akroviakov · 2024-11-27T16:07:48Z

As I have mentioned in the beginning, the test runs:

~/graph-compiler/build$ /home/.../graph-compiler/build/bin/gc-gpu-runner --shared-libs=/home/.../graph-compiler/externals/llvm-project/build/lib/libmlir_runner_utils.so /home/.../graph-compiler/test/mlir/test/gc/gpu-runner/mlp.mlir 
[ERROR] [/home/.../graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:357] Failed to get the number of devices on the platform.0x55bc84c46970 Error: -1
Unranked Memref base@ = 0x55bc85e94480 rank = 2 offset = 0 sizes = [1, 10] strides = [10, 1] data = 
[[0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1,   0.1]]

However, it also displays an error for one of the platforms. This error is displayed because we ask the following platform:

Platform: Intel(R) OpenCL
  Device: Intel(R) Xeon(R) Gold 6438Y+

for a device ID of type CL_DEVICE_TYPE_GPU which triggers:

    if (err != CL_SUCCESS) {
      gcLogE("Failed to get the number of devices on the platform.", platform,
             " Error: ", err);
      continue;
    }

So is it ok for GpuOclRuntime to see platforms that are not OpenCL Graphics (e.g., by simply iterating over them)? If yes, does that mean that I can simply ignore the error?

AndreyPavlenko · 2024-11-27T16:24:02Z

does that mean that I can simply ignore the error?

Got it. The error message is redundant here. It should be either replaced with a debug message or removed.

akroviakov requested a review from AndreyPavlenko November 26, 2024 13:47

[GpuOclRuntime] Avoid error when iterating over non-gpu devices in gp…

afc9141

…u runner

akroviakov force-pushed the akroviak/gc-gpu-rt-device-type branch from 80f8eaf to afc9141 Compare November 26, 2024 14:06

AndreyPavlenko closed this Nov 26, 2024

AndreyPavlenko reopened this Nov 26, 2024

akroviakov closed this Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GpuOclRuntime] Avoid error when iterating over non-gpu devices #413

[GpuOclRuntime] Avoid error when iterating over non-gpu devices #413

Uh oh!

akroviakov commented Nov 26, 2024 •

edited

Loading

Uh oh!

AndreyPavlenko commented Nov 26, 2024 •

edited

Loading

Uh oh!

AndreyPavlenko commented Nov 26, 2024

Uh oh!

kurapov-peter commented Nov 27, 2024

Uh oh!

akroviakov commented Nov 27, 2024

Uh oh!

AndreyPavlenko commented Nov 27, 2024

Uh oh!

akroviakov commented Nov 27, 2024 •

edited

Loading

Uh oh!

AndreyPavlenko commented Nov 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[GpuOclRuntime] Avoid error when iterating over non-gpu devices #413

[GpuOclRuntime] Avoid error when iterating over non-gpu devices #413

Uh oh!

Conversation

akroviakov commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreyPavlenko commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreyPavlenko commented Nov 26, 2024

Uh oh!

kurapov-peter commented Nov 27, 2024

Uh oh!

akroviakov commented Nov 27, 2024

Uh oh!

AndreyPavlenko commented Nov 27, 2024

Uh oh!

akroviakov commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreyPavlenko commented Nov 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

akroviakov commented Nov 26, 2024 •

edited

Loading

AndreyPavlenko commented Nov 26, 2024 •

edited

Loading

akroviakov commented Nov 27, 2024 •

edited

Loading