forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
merge from zhouwg ggml-hexagon #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
l3utterfly
merged 186 commits into
l3utterfly:ggml-hexagon
from
jeffzhou2000:pr_to_upstream
May 23, 2025
Merged
merge from zhouwg ggml-hexagon #66
l3utterfly
merged 186 commits into
l3utterfly:ggml-hexagon
from
jeffzhou2000:pr_to_upstream
May 23, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1 * Fixed lldb attach * Simplify by having the child do ggml_print_backtrace_symbols
ggml-ci
* wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality
* sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table
…rg#13482) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
* Update CANN model support status * Update of model support * update * update * update * fix format of CANN.md * fix format of CANN.md * fix format of CANN.md
* kv-cache : prepare for SWA ggml-ci * kv-cache : initial iSWA implementation ggml-ci * kv-cache : rework error recovery logic ggml-ci * models : fix Phi-3 SWA parameters ggml-ci * model : adjust Granite to rope factor changes ggml-ci * server : check if context can do shifts ggml-ci * iswa : for now, always enable shifts (experiment) ggml-ci * kv-cache : simplify SWA logic ggml-ci * kv-cache : apply defrag when we fail to find slots for the batch ggml-ci * llama : update docs about llama_decode ggml-ci * kv-cache : update warning logs when no space for the batch is available ggml-ci * llama : add llama_kv_self_seq_pos_min() * kv-cache : keep track of partial SWA computes and print warnings * server : disallow use cases involving partial SWA context ggml-ci * llama : add param to control SWA cache size ggml-ci * minor : clean-up ggml-ci
…to fix infinity values in output (ggml-org#13639)
* CUDA: skip fully masked-out KV in FA vec kernel
* Update mtmd-helper.cpp * Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Xuan-Son Nguyen <[email protected]> --------- Co-authored-by: Xuan-Son Nguyen <[email protected]>
* small fixes * remove ifdef
…ITY op to accelerate D2D memory copy (ggml-org#13647) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <[email protected]> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <[email protected]> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <[email protected]> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <[email protected]> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]>
* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo
* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci
* server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Dorin Geman <[email protected]> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Dorin Geman <[email protected]> --------- Signed-off-by: Dorin Geman <[email protected]> Co-authored-by: Piotr Stankiewicz <[email protected]>
* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.
* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order
Signed-off-by: Emmanuel Ferdman <[email protected]>
* switch retrieval to llama_encode * enable --no-warmup for retrieval
…orks in a standard Android APP)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Apple Metal
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Nvidia GPU
python
script
server
SYCL
testing
Vulkan
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.