Skip to content

Conversation

@michel2323
Copy link
Member

We observed norm()=0 on oneAPI arrays when it shouldn't. Upon review, it seems one needs a wait after each MKL call. They support event dependencies etc.

We have still issues with correctness of axpy in https://github.com/JuliaSmoothOptimizers/KrylovPreconditioners.jl

@michel2323 michel2323 requested a review from amontoison August 22, 2025 19:11
@github-actions
Copy link
Contributor

github-actions bot commented Aug 22, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/deps/generate_interfaces.jl b/deps/generate_interfaces.jl
index beb87ea..9e7bd3f 100644
--- a/deps/generate_interfaces.jl
+++ b/deps/generate_interfaces.jl
@@ -447,17 +447,17 @@ function generate_cpp(library::String, filename::Vector{String}, output::String;
     write(oneapi_cpp, "extern \"C\" $header {\n")
     if template
       type = version_types[version]
-      !occursin("scratchpad_size", name) && write(oneapi_cpp, "   auto status = oneapi::mkl::$library::$variant$name<$type>($parameters, {});\n   device_queue->val.wait_and_throw();\n")
-      occursin("scratchpad_size", name)  && write(oneapi_cpp, "   int64_t scratchpad_size = oneapi::mkl::$library::$variant$name<$type>($parameters);\n   device_queue->val.wait_and_throw();\n")
-      # !occursin("scratchpad_size", name) && write(oneapi_cpp, "   auto status = oneapi::mkl::$library::$variant$name<$type>($parameters, {});\n")
-      # occursin("scratchpad_size", name)  && write(oneapi_cpp, "   int64_t scratchpad_size = oneapi::mkl::$library::$variant$name<$type>($parameters);\n")
+            !occursin("scratchpad_size", name) && write(oneapi_cpp, "   auto status = oneapi::mkl::$library::$variant$name<$type>($parameters, {});\n   device_queue->val.wait_and_throw();\n")
+            occursin("scratchpad_size", name)  && write(oneapi_cpp, "   int64_t scratchpad_size = oneapi::mkl::$library::$variant$name<$type>($parameters);\n   device_queue->val.wait_and_throw();\n")
+            # !occursin("scratchpad_size", name) && write(oneapi_cpp, "   auto status = oneapi::mkl::$library::$variant$name<$type>($parameters, {});\n")
+            # occursin("scratchpad_size", name)  && write(oneapi_cpp, "   int64_t scratchpad_size = oneapi::mkl::$library::$variant$name<$type>($parameters);\n")
     else
       if !(name ∈ void_output)
         write(oneapi_cpp, "   auto status = oneapi::mkl::$library::$variant$name($parameters, {});\n")
-        occursin("device_queue", parameters) && write(oneapi_cpp, "   device_queue->val.wait_and_throw();\n")
+                occursin("device_queue", parameters) && write(oneapi_cpp, "   device_queue->val.wait_and_throw();\n")
       else
         write(oneapi_cpp, "   oneapi::mkl::$library::$variant$name($parameters);\n")
-        occursin("device_queue", parameters) && write(oneapi_cpp, "   device_queue->val.wait_and_throw();\n")
+                occursin("device_queue", parameters) && write(oneapi_cpp, "   device_queue->val.wait_and_throw();\n")
       end
     end
     if occursin("scratchpad_size", name)

@codecov
Copy link

codecov bot commented Aug 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.73%. Comparing base (115a10f) to head (2da6dde).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #518   +/-   ##
=======================================
  Coverage   81.73%   81.73%           
=======================================
  Files          44       44           
  Lines        2540     2540           
=======================================
  Hits         2076     2076           
  Misses        464      464           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@amontoison amontoison merged commit cf05fb5 into master Aug 23, 2025
2 checks passed
@amontoison amontoison deleted the ms/wait branch August 23, 2025 07:34
@maleadt
Copy link
Member

maleadt commented Sep 1, 2025

Upon review, it seems one needs a wait after each MKL call. They support event dependencies etc.

Can you elaborate? Making these calls synchronous is not something we want, and synchronization should be handled on the Julia side already when accessing the destination oneArray. Doing this eagerly risks introducing execution bubbles and killing performance.

@michel2323
Copy link
Member Author

michel2323 commented Sep 2, 2025

Oh ok. We observed wrong values in Krylov.jl when multiple MKL calls are called in series. Upon reviewing the Intel MKL examples, it appears that the MKL calls are asynchronous and not ordered in the SYCL queue. You'd need to synchronize with the returned event. And the event is not returned to Julia and can't be passed to the following MKL calls. Do you think we should do that then?

@michel2323
Copy link
Member Author

michel2323 commented Sep 2, 2025

The default flag flags=0 is out-of-order. Do you think we should change this to in order? @maleadt

@maleadt
Copy link
Member

maleadt commented Sep 3, 2025

enumerator ZE_COMMAND_QUEUE_FLAG_IN_ORDER

To be used only when creating immediate command lists. Commands appended to the immediate command list are executed in-order, with driver implementation enforcing dependencies between them. Application is not required to have the signal event of a given command being the wait event of the next to define an in-order list, and application is allowed to pass signal and wait events to each appended command to implement more complex dependency graphs.

It does seem enticing, but the "To be used only when creating immediate command lists" doesn't apply here, so I'm not sure. Maybe we should ping some people at Intel, or open some issue upstream to figure out what's the best way to emulate CUDA's stream-ordered operations without having to use events everywhere.

cc @kballeda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants