-
Notifications
You must be signed in to change notification settings - Fork 16k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
cpp: Adding new arch RUGPT3XL
model
Model specific
python
python script changes
#21161
opened Mar 29, 2026 by
EvilFreelancer
Loading…
Cross-backend profiler
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
Ascend NPU
issues specific to Ascend NPUs
documentation
Improvements or additions to documentation
examples
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
IBM zDNN
issues specific to IBM zDNN Accelerator
Nvidia GPU
Issues specific to Nvidia GPUs
OpenCL
Issues specific to the OpenCL backend
OpenVINO
python
python script changes
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
Vulkan
Issues specific to the Vulkan backend
WebGPU
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21159
opened Mar 29, 2026 by
gaugarg-nv
Loading…
ggml-cpu: fix fallback for RVV kernels without zvfh
ggml
changes relating to the ggml tensor library for machine learning
#21157
opened Mar 29, 2026 by
taimur-10x
Loading…
Support for DeepseekV32ForCausalLM with DeepSeek Sparse Attention (DSA)
ggml
changes relating to the ggml tensor library for machine learning
model
Model specific
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
testing
Everything test related
#21149
opened Mar 29, 2026 by
fairydreaming
•
Draft
ggml-webgpu: Add the support of Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
MUL_MAT_ID
documentation
#21147
opened Mar 29, 2026 by
yomaytk
Loading…
CI: Fix docker multiarch overwrite
devops
improvements to build systems and github actions
#21144
opened Mar 29, 2026 by
Ts-sound
Loading…
common: add two-phase graceful reasoning budget termination ...
#21141
opened Mar 29, 2026 by
zeel2104
Loading…
grammar: make MAX_REPETITION_THRESHOLD configurable via env var
#21139
opened Mar 29, 2026 by
vampyrebat
Loading…
CI: Enable CUDA and Vulkan ARM64 runners and fix CI/CD
devops
improvements to build systems and github actions
documentation
Improvements or additions to documentation
#21122
opened Mar 28, 2026 by
ehfd
Loading…
metal: add opt-in V skip for negligible attention weights
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#21119
opened Mar 28, 2026 by
TheTom
Loading…
convert: Add compressed-tensors NVFP4 conversion
python
python script changes
#21095
opened Mar 28, 2026 by
michaelw9999
Loading…
server/webui: cleanup dual representation approach, simplify to openai-compat
examples
server
#21090
opened Mar 28, 2026 by
pwilkin
Loading…
common: add bounds check in common_init_result::sampler to prevent segfault on failed model load
examples
testing
Everything test related
#21082
opened Mar 27, 2026 by
mtmcp
Loading…
fix cmake problem to exclude CCAN
Ascend NPU
issues specific to Ascend NPUs
ggml
changes relating to the ggml tensor library for machine learning
need more info
The OP should provide more details about the issue
#21075
opened Mar 27, 2026 by
sunqingn7
Loading…
ggml-cuda: Add generic NVFP4 MMQ kernel
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
python
python script changes
script
Script related
#21074
opened Mar 27, 2026 by
michaelw9999
Loading…
server: (webui) no more gzip compression
examples
server/webui
server
#21073
opened Mar 27, 2026 by
ngxson
Loading…
hexagon: optimize HMX matmul operations
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
#21071
opened Mar 27, 2026 by
chraac
Loading…
Add quantization recipes from custom recipe files
examples
testing
Everything test related
#21070
opened Mar 27, 2026 by
bartowski1182
•
Draft
Previous Next
ProTip!
no:milestone will show everything without a milestone.