Skip to content

Conversation

@ggerganov
Copy link
Member

No description provided.

PABannier and others added 5 commits December 3, 2024 19:35
* wip

* wip implementation f32

* kernel conv transpose 1d f32 working

* initial commit
* implemented argmax kernel

* tpig -> tgpig

* change to strides

* contiguous assertions

* kernel working and tested

* argmax simd parallel implementation

* added 2 new tests for argmax in test-backend-ops

* cosmit

* added 3 tests cases for perf eval

* add test_argmax in make_test_cases_perf

* Update test-backend-ops.cpp

Co-authored-by: Diego Devesa <[email protected]>

---------

Co-authored-by: Diego Devesa <[email protected]>
* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <[email protected]>
@github-actions github-actions bot added script Script related testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 3, 2024
@ggerganov ggerganov merged commit 1cd3df4 into master Dec 3, 2024
51 checks passed
@ggerganov ggerganov deleted the sync branch December 3, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs script Script related testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants