-
Notifications
You must be signed in to change notification settings - Fork 13.4k
vulkan: sort graph to allow more parallel execution #15850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6513,6 +6513,7 @@ static void ggml_backend_metal_set_n_cb(ggml_backend_t backend, int n_cb) { | |||||||||||||||||||||||
/* .graph_compute = */ ggml_backend_metal_graph_compute, | ||||||||||||||||||||||||
/* .event_record = */ NULL, | ||||||||||||||||||||||||
/* .event_wait = */ NULL, | ||||||||||||||||||||||||
/* .optimize_graph = */ NULL, | ||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it could be renamed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll fix this when I get back to work, if nobody beats me to it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Btw, I'm wondering what is the benefit of delegating this optimization step to the scheduler. Seems like the same effect can be achieved by creating an array of indices with the order in which we want to traverse the graph. This can be done inside There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ggml-alloc will aggressively reuse memory which will interfere with concurrency. I prototyped a version of this where I did it entirely in the backend, and I had to basically ignore the real allocations and use temporary allocations for all tensors I wanted to reorder. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok got it. Another question though - isn't the original concern from #15489 (comment) now valid again? Without the actual address ranges, you might miss a dependency between the nodes that is not represented by the graph. Back there you solved it by using the actual address ranges, but here this logic is not present. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not completely sure, but I did consider this case. Something like set_rows still operates on (views of )tensors and I included a check that treats it as an implicit dependency if two operations view the same tensor. There aren't any actual allocations at this point so it all has to be done in terms of tensors, so I think this works out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the Metal backend I implemented a backend-agnostic graph optimization that should reorder the nodes for improved concurrency, while preserving the order of fusable ops and also does not reorder problematic operators such as llama.cpp/ggml/src/ggml-metal/ggml-metal-common.cpp Lines 377 to 387 in 4b8560a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it's possible to generate the most efficient ordering without knowing what is actually (not just theoretically) fusable by the backend. For example, if you have two matmul+adds:
If the backend fuses matmul+add, then t0,t1,t2,t3 is the correct order - the two fused matmuladds can run concurrently. But if the backend does not fuse matmul+add, then the better order is t0,t2,t1,t3, so the two matmuls can run concurrently and the two adds can run concurrently. |
||||||||||||||||||||||||
}; | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
static ggml_guid_t ggml_backend_metal_guid(void) { | ||||||||||||||||||||||||
|
Uh oh!
There was an error while loading. Please reload this page.