-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Metal: add opt_step_adamw and op_sum #16529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've tried commenting out most of the code inside
The culprit is this line right here, but I'm unsure as to whether this has something to do with my kernel or whether this is another issue. llama.cpp/ggml/src/ggml-backend.cpp Line 796 in c7be9fe
|
We need to implement the |
I see, thanks for the follow up. Would you like me to implement |
Yes, it should be in this PR so that the CI becomes green. |
I have a working P.S. For an optimised implementation, would we be looking at reducing the lane for each SIMD group using |
Yes, we can optimize it in a separate PR. For now just correctness is OK.
Yes, exactly. |
Right, this change fixed Is it possible that it is too restrictive to tie case GGML_OP_SUM:
case GGML_OP_SUM_ROWS:
case GGML_OP_MEAN:
case GGML_OP_SOFT_MAX:
case GGML_OP_GROUP_NORM:
return has_simdgroup_reduction && ggml_is_contiguous_rows(op->src[0]); I think It might be better to do something like this for now: case GGML_OP_SUM:
return ggml_is_contiguous(op->src[0]) && op->src[0]->type == GGML_TYPE_F32; |
It might be better to add |
Yes, that makes sense. I've made that change in the newest commit I also think the previous commit has passed the failing CI |
* origin/master: (32 commits) metal : FA support F32 K and V and head size = 32 (ggml-org#16531) graph : support cacheless embeddings with FA and iSWA (ggml-org#16528) opencl: fix build targeting CL 2 (ggml-org#16554) CUDA: fix numerical issues in tile FA kernel (ggml-org#16540) ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520) CANN: fix CPU memory leak in CANN backend (ggml-org#16549) fix: add remark plugin to render raw HTML as literal text (ggml-org#16505) metal: add support for opt_step_sgd (ggml-org#16539) ggml : fix scalar path for computing norm (ggml-org#16558) CANN: Update several operators to support FP16 data format (ggml-org#16251) metal : add opt_step_adamw and op_sum (ggml-org#16529) webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506) [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521) ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532) common : handle unicode during partial json parsing (ggml-org#16526) common : update presets (ggml-org#16504) ggml : Fix FP16 ELU positive branch (ggml-org#16519) hparams : add check for layer index in is_recurrent (ggml-org#16511) ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518) CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492) ...
* scaffold to support opt step adamw on metal (not written so far) * add opt-step-adamw kernel for metal * pass op->src[4] as a separate buffer to the pipeline * add bounds check to opt-step-adamw kernel * complete scaffold for GGML_OP_SUM * naive GGML_OP_SUM kernel * remove unwanted comment * change OP_SUM capability gate * Add has_simdgroup_reduction to both ops to pass CI
Part of #14909:
This PR adds Metal backend for the
OPT_STEP_ADAMW
operator.Implementation
kernel_opt_step_adamw_f32
kernel inggml-metal.metal
for parameters,ggml_metal_kargs_opt_step_adamw
parameters andnp
in the struct as a passedconstant & args
ggml_metal_op_upscale
I've not written test cases for this operator as one already exists in
test-backend-ops.cpp
.Additional changes
test-opts
failing, theGGML_OP_SUM
operator is implemented in Metal (currently unoptimised)