Any idea why the DirectML is so buggy? #1115

elephantpanda · 2024-12-03T06:47:36Z

elephantpanda
Dec 3, 2024

Just wondered if anyone knew why the DirectML mode (Microsoft's own provider) seems to be causing so many problems?
I have used onnxruntime for a long time and never had any crashes or bugs.
And isn't genai just built on top of onnxruntime?

I'm just curious how the bugs could get in?

I presume genai is doing something fancy over and above what onnxruntime is capable of in order to get that extra bit of speed. Some "unsafe" DML code maybe?

If I knew what kind of hacks are being used to speed up onnxruntime, why can't these things be added as part of the onnxruntime API to be fully tested?

Sorry, I don't mean to complain about open source software. It is beyond my area of expertise certainly. I am just curious. Keep up the good work. 👍

RyanUnderhill · 2024-12-03T21:29:47Z

RyanUnderhill
Dec 3, 2024

The genai library was originally designed as some utility functions that had separate optimizations if the cuda provider was used. We're working on making it more generalized to support other providers better, through a better abstraction vs the current if(device_type==Cuda/Dml) stuff. So the problem is that the Dml was tacked on to make it work, but it's not very clean yet. You can see that webgpu is also tacked on similarly, and the webgpu team didn't like how ugly it was either :)

0 replies

elephantpanda · 2024-12-03T21:36:25Z

elephantpanda
Dec 3, 2024
Author

Thanks for the reply!

0 replies

driverding · 2025-04-13T15:16:25Z

driverding
Apr 13, 2025

Hope that directml can get stable soon. Many of the examples still wont function on directml.

0 replies

elephantpanda · 2025-04-13T16:27:10Z

elephantpanda
Apr 13, 2025
Author

Personally I've gave up of Gen AI since last year. It's never worked on DirectML without serious bugs so it's not worth me investing any time in it.
It was a great idea, unfortunately doesn't seem like one of Microsoft's priorities right now.

2 replies

RyanUnderhill Apr 15, 2025

Sadly we lost resources on the DML team, and the GenAI team is doing our best to make the important cases work with good performance. We have been focusing on other accelerators like WebGPU that should work on all platforms and computers and give good performance.

atcarter714 Apr 15, 2025

Sadly we lost resources on the DML team, and the GenAI team is doing our best to make the important cases work with good performance. We have been focusing on other accelerators like WebGPU that should work on all platforms and computers and give good performance.

Lost resources as in team members? I have some expertise in .NET, DirectX 12, interop, etc (see my "DXSharp" project). I might be interested in working on DML if they need people.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Any idea why the DirectML is so buggy? #1115

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Any idea why the DirectML is so buggy? #1115

Uh oh!

Uh oh!

elephantpanda Dec 3, 2024

Replies: 4 comments · 2 replies

Uh oh!

RyanUnderhill Dec 3, 2024

Uh oh!

elephantpanda Dec 3, 2024 Author

Uh oh!

driverding Apr 13, 2025

Uh oh!

elephantpanda Apr 13, 2025 Author

Uh oh!

RyanUnderhill Apr 15, 2025

Uh oh!

atcarter714 Apr 15, 2025

elephantpanda
Dec 3, 2024

Replies: 4 comments 2 replies

RyanUnderhill
Dec 3, 2024

elephantpanda
Dec 3, 2024
Author

driverding
Apr 13, 2025

elephantpanda
Apr 13, 2025
Author