-
Notifications
You must be signed in to change notification settings - Fork 794
[CI] Update Level Zero used in docker images #20516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
079d297 to
3b80f0c
Compare
|
@steffenlarsen , when doing L0 update in CMake should we also bump these dependencies? (ref. #20431) or should it be updated differently/separately? |
3b80f0c to
a47eab1
Compare
|
Digging into the logs I can see that assets[].browser_download_url is missing - verified this locally and apparently v1.25.0 (or v1.25.1) does not have it, but v1.24.3 do have it. So I understand that bumping version will have to wait for a release with assets (perhaps a non-"pre-release" version?). ref. https://github.com/oneapi-src/level-zero/releases ping @intel/dpcpp-devops-reviewers - is there any other way to bump L0 version in the docker? Or perhaps, we should work around it e.g. in benchmarks' workflow...? |
|
The problem is that v1.25.1 is a preelase release, not an official release, so:
If you need it for benchmarking you can do whatever is necessary there, but I don't think we should modify the docker images. You can talk to Neil for more info |
|
I see, thx. So @PatKamin I guess we have to wait for a full release or, as I proposed above, make a workaround in benchmarks' workflow (uninstall L0 and install custom one). Since benchmarks are not working for a few days now, I guess quick workaround would be nice...? @nrspruit do you have perhaps a timeline for the next full (not a "pre-release") version? |
I plan to post v1.25.2 as the "release" version of the v1.25.x series loader so that will be the version to update to. I plan to update to a release version on Friday. |
|
ok, thx for letting us know :) |
https://github.com/oneapi-src/level-zero/releases/tag/v1.25.2 has been officially released as the "latest" version with packages. |
a47eab1 to
2d0b0e4
Compare
|
Updated to v1.25.2. Thanks, @nrspruit! |
|
@intel/dpcpp-devops-reviewers, correct me if I'm wrong, but this change should not affect E2E tests as they use already uploaded images with the older version of level zero. |
|
As-is this PR will cause a rebuild of the docker images and then update the level zero version used in CI. If that's intentional that's fine. |
Yes, that's what I want to achieve. The tests that are run within this PR are still using the old images, right? The new images (hence new level zero) will be used in CI only after the merge? |
|
The images themselves are using the old L0 but to allow testing if we see changes to dependenices.json we do an additonal step in the CI job to reinstall the dependencies based on the updated dependencies.json, it's the 'Install drivers' step in the run log. So TLDR the tests are running with the new L0 and the E2E failures in this PR are probably related if they go away if you use the current L0 version. |
fab8ae0 to
2d0b0e4
Compare
|
@nrspruit, could you take a look at the failing level zero tests? I would like to exclude possible bugs between level zero |
It seems like the V2 adapter is using symbols currently that are being exposed by both the driver and the new loader where there may be a conflict. You are running with an updated loader "before" you have the updated L0 gpu driver so this is not a configuration that you will encounter that would be "supported" . Ie, the GPU driver is updated "with" the new loader. The failures seem to be that the Adapters are using symbols that are now provided by the loader, but were incorrectly named the same in the old driver and now they report unsupported. I will verify this conclusion, but that would explain why the tests are not all failing on all platforms, only those platforms that are supporting the new features. |
Thank you for this insight. Looks like this update will have to wait for a proper Compute Runtime release supporting Level Zero v1.25.2 or at least v1.25.0... |
I am working on possibly fixing this here: One issue is that the v2 adapter redefined the symbol for zeCommandListAppendLaunchKernelWithArguments which did not have a definition before, but now does in v1.14 of the L0 spec. The call to that api is now going to the real api in the loader, but the driver does not support that api. There is a symbol mismatch/conflict happening here that we may not be able to resolve without a "real" driver with this support. @pbalcer FYI, I am working to resolve this, but it appears all the issues are in the V2 adapter where the V2 adapter was redefining symbols with the same names as the symbols now provided in v1.14 of L0 and it is breaking init. For some reason I cannot reproduce the issues locally, but I can reproduce the issues in the CI here. |
|
Hello @PatKamin , I have a patch that will enable this update and fix/work around the issues in the L0 GPU Driver. #20537 Please merge this patch instead, @pbalcer please review. Unfortunately, because the driver exp version of zeCommandListAppendLaunchKernelWithArguments is named the same as the spec version, the driver was reporting the loader's symbol instead of its own, thus returning the "unsupported" version of the api in that driver. Therefore, we cannot report support for that api the UR V2 adapter until one has a new enough driver with the fix: I have verified my fix in that PR. The loader update exposed the bug in the L0 GPU Driver unfortunately. |
This update is made to align with the level zero used in unified-runtime CMake files:
llvm/unified-runtime/cmake/FetchLevelZero.cmake
Lines 50 to 53 in 5df5f45
Also, latest Compute Benchmarks require Level Zero >=v1.25.0 (https://github.com/intel/llvm/actions/runs/18933687847/job/54055880756?pr=20500#step:19:1948):