-
Couldn't load subscription status.
- Fork 13.4k
ggml: add ops for WAN video model (cuda && cpu) #15669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
c92f9b4
93c7e77
f7a12f9
85c8e1e
ae47cac
dd745ba
d8377a0
d30e07d
df05913
9d035c4
d11a729
f6a874c
f6278c8
c9b9fab
131ae2d
0d5eb51
aafa79a
3f901e3
e66bf6e
b4c50be
8f5e7b0
21e9338
36f2215
d9f1d13
6b71242
9b365e8
6b6eede
2412bb0
b38bfbb
457f186
1618844
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I forgot: why is the GGUF header being changed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because the GGUF file of WAN contains a 5D tensor, for example: https://huggingface.co/QuantStack/Wan2.2-T2V-A14B-GGUF/tree/main/LowNoise?show_file_info=LowNoise%2FWan2.2-T2V-A14B-LowNoise-Q2_K.gguf. Therefore, I added the tensor_shape_read_cb_t callback, allowing users to control how the multi-dimensional(>4) tensor is mapped to a 4D format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally there should be only one
conv_3doperation, and the backend can decide on an implementation (direct, im2col, tiled im2col, or choose one depending on inputs & hardware).The reason there are two versions for
conv_2dis because historically the im2col version was there first, and it coded im2col into the graph. This means the backend cannot optimize it or choose a better implementation, and the huge memory requirements of full tensor im2col or baked into the graph. This was difficult to change without breaking backends, hence the "direct" workaround. But IMO we should avoid this situation for newly introduced conv ops.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the ultimate goal. However, currently the support for each backend is not yet complete, and conv_2d_direct cannot fully replace im2col + gemm at present. In some backends such as CUDA, conv_2d_direct is much slower than im2col + gemm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there's no solution for conv_2d right away. My point is, this PR now creates the same problem for 3d. I don't think new code should be introduced that repeats the issue and bakes im2col as the default way to do conv_3d.
My suggestion would be to either implement im2col+mul_mat behind OP_CONV_3D in the CUDA backend. Or only expose im2col_3d on the API, and move im2col+mul_mat calls into the application (simpler but not as nice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For CUDA one of my long-term goals is to write convolution kernels, particularly ones that can make use of quantized data. But I should stress that I have so many other things I want to do that realistically I would start working on it in a year at the absolute earliest.