Skip to content

Conversation

l3utterfly
Copy link
Owner

IMbackK and others added 19 commits November 27, 2024 17:10
* Add some minimal optimizations for CDNA

* ggml_cuda: set launch bounds also for GCN as it helps there too
* [cann] ROPE operator optimization

Co-authored-by: noemotiovon <[email protected]>
* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment

* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.

* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
* kompute: op_unary: reject unsupported parameters

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: softmax: implement ALiBi support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: rope: implement neox and phi3 support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_q4_k permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_f16 permutted support

Signed-off-by: Sergio Lopez <[email protected]>

* kompute: op_mul_mat_q6_k permutted support

Signed-off-by: Sergio Lopez <[email protected]>

---------

Signed-off-by: Sergio Lopez <[email protected]>
* ggml-cpu: support IQ4_NL_4_4 by runtime repack

* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
…penai client library (ggml-org#10568)

* server : (tests) don't use thread for capturing stdout/stderr

* test: bump openai to 1.55.2

* bump openai to 1.55.3
This is an incremental improvement over ggml-org#9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
* [cann] RoPE operator optimization

* [CANN]Code Formatting

---------

Co-authored-by: noemotiovon <[email protected]>
This PR fixes the failing MUL_MAT tests for the sycl backend.
@l3utterfly l3utterfly merged commit 61607e8 into layla-build Nov 29, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.