forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
merge ggml-hexagon implementation #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
l3utterfly
merged 197 commits into
l3utterfly:ggml-hexagon
from
jeffzhou2000:pr_to_upstream
Apr 30, 2025
Merged
merge ggml-hexagon implementation #62
l3utterfly
merged 197 commits into
l3utterfly:ggml-hexagon
from
jeffzhou2000:pr_to_upstream
Apr 30, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* convert : experimental support for `--mmproj` flag * fix bad ctrl+f replace * fix style * split into subclasses TextModel and VisionModel * rename Mode --> ModelBase * small fix * correct CLIP_VISION arch name (because existing GGUF already use it) * Apply suggestions from code review Co-authored-by: compilade <[email protected]> * fix Mistral3Model * fix typo Co-authored-by: compilade <[email protected]> --------- Co-authored-by: compilade <[email protected]>
…li` (ggml-org#13012) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
…g#12871) * ggml : add SSE 4.2 variant for CPUs without AVX * ggml : add x64 base ABI variant
* llava : update documentations * fix typo
* metal : add memory pool for temp allocs (wip) [no ci] * cont : free buffers from the heap * cont : resize heap [no ci] * cont : refactor heap [no ci] * cont : heap for each cmd buffer [no ci] * cont : fix free * wip * cont : fix alignment [no ci] * cont : not working .. [no ci] * cont : heap allocation now works [no ci] * cont : use MTLHeapTypePlacement ggml-ci * metal : use dynamic MTLHeap allocations ggml-ci * metal : add comments * metal : disable softmax use of mem_pool ggml-ci * metal : final touches
* security : add note about RPC functionality * security : add note about llama-server
* mtmd : support SmolVLM (version 1 and 2) * correct chat template * fix n_patches * scale_factor is an int * add more models to test
* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID * fix logic for RoPE support, CUDA graphs
…13021) * append mult-eos,half-rope,bos to GLM4-0414 * remove unset var
* add pixtral text model (vision is wip) * cgraph ok, just missing 2D RoPE * fix bad rebase * first working version * fix problem with img_break token * support dynamic image size * update docs * update test script
* Sigint rework in mtmd vision example * Applied suggestions on mtmd-cli PR * Forgot to invert one of the conditions * Update examples/llava/mtmd-cli.cpp * Removed redundant exit check --------- Co-authored-by: pl752 <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>
* tune matmul for gcn * this one is more power efficient * Update ggml/src/ggml-vulkan/ggml-vulkan.cpp Co-authored-by: 0cc4m <[email protected]> * disable this tune for the proprietary driver --------- Co-authored-by: 0cc4m <[email protected]>
* arg : clean up handling --mmproj with -hf * rm change about no_mmproj * Revert "rm change about no_mmproj" This reverts commit 2cac8e0. * handle no_mmproj explicitly * skip download mmproj on examples not using it
* arg : add --no-mmproj-offload * Update common/arg.cpp
* cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci
* ggml-cpu : kernels for faster depthwise 2D convolution * fix compile: remove static after moving to ops.cpp * add dilation for depthwise_conv_2d * review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace * review: rename depthwise_conv_2d -> conv_2d_dw everywhere
ggml-ci
…org#12943) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.
…orks in a standard Android APP)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Apple Metal
build
documentation
Improvements or additions to documentation
examples
ggml
Nvidia GPU
python
script
server
SYCL
testing
Vulkan
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.