Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
2139667
metal : fix out-of-bounds write (#11314)
ggerganov Jan 21, 2025
2e2f8f0
linenoise.cpp refactoring (#11301)
ericcurtin Jan 21, 2025
6da5bec
rpc : better caching of the base buffer pointer (#11331)
rgerganov Jan 21, 2025
e28245f
export-lora : fix tok_embd tensor (#11330)
ngxson Jan 21, 2025
6171c9d
Add Jinja template support (#11016)
ochafik Jan 21, 2025
3e3357f
llava : support Minicpm-omni (#11289)
tc-mb Jan 22, 2025
a94f3b2
`common`: utils to split / join / repeat strings (from json converter…
ochafik Jan 22, 2025
96f4053
Adding logprobs to /v1/completions (#11344)
jpodivin Jan 22, 2025
c64d2be
`minja`: sync at https://github.com/google/minja/commit/0f5f7f2b3770e…
ochafik Jan 22, 2025
12c2bdf
server : fix draft context not being released (#11354)
slaren Jan 22, 2025
16d3df7
readme : add plugin links (#11355)
ggerganov Jan 22, 2025
6152129
main : update README documentation for batch size (#11353)
slaren Jan 22, 2025
5245729
vulkan: fix diag_mask_inf (#11323)
jeffbolznv Jan 23, 2025
1971adf
vulkan: sort shaders for more deterministic binary (#11315)
jeffbolznv Jan 23, 2025
955a6c2
Vulkan-run-test: fix mmq_wg_denoms (#11343)
AMD-dwang Jan 23, 2025
f211d1d
Treat hf.co/ prefix the same as hf:// (#11350)
ericcurtin Jan 23, 2025
5845661
server : add more clean up when cancel_tasks is called (#11340)
ngxson Jan 23, 2025
f7fb43c
Add -ngl (#11372)
ericcurtin Jan 23, 2025
05f63cc
Update documentation (#11373)
ericcurtin Jan 23, 2025
564804b
tests: fix some mul_mat test gaps (#11375)
jeffbolznv Jan 23, 2025
c07e87f
server : (webui) put DeepSeek R1 CoT in a collapsible <details> eleme…
stduhpf Jan 24, 2025
01f37ed
Update llama-run README.md (#11386)
ericcurtin Jan 24, 2025
1af6945
cmake : avoid -march=native when reproducible build is wanted (#11366)
bmwiedemann Jan 24, 2025
8137b4b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)
JohannesGaessler Jan 24, 2025
a07c2c8
docs : Update readme to build targets for local docker build (#11368)
JafarAbdi Jan 24, 2025
9755129
release : pack /lib in the packages (#11392)
ggerganov Jan 24, 2025
9fbadae
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356)
IMbackK Jan 24, 2025
c5d9eff
CUDA: fix FP16 cuBLAS GEMM (#11396)
JohannesGaessler Jan 24, 2025
5f0db95
hip : Add hipGraph and VMM support to ROCM (#11362)
IMbackK Jan 24, 2025
466ea66
CANN: Add Ascend CANN build ci (#10217)
xuedinge233 Jan 24, 2025
00c24ac
ci : fix line breaks on windows builds (#11409)
ggerganov Jan 25, 2025
20a7581
docker : fix CPU ARM build (#11403)
slaren Jan 25, 2025
49b0e3c
server : fix cleaning up stream task (#11418)
ngxson Jan 25, 2025
6e264a9
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to buil…
slaren Jan 25, 2025
ca6baf7
build: add /bigobj to MSVC build (#11407)
jeffbolznv Jan 25, 2025
26771a1
Hip: disable VMM on hip as it seams that it dosent work in some confi…
IMbackK Jan 25, 2025
4a75d19
vulkan: compile shaders on-demand (#11406)
jeffbolznv Jan 25, 2025
f35726c
build: apply MSVC /bigobj option to c/cpp files only (#11423)
jeffbolznv Jan 26, 2025
2cc9b8c
readme : update hot topics
ggerganov Jan 26, 2025
1d8ee06
rpc: fix register position (#11424)
thxCode Jan 26, 2025
19f6518
cmake: add ggml find package (#11369)
bandoti Jan 26, 2025
6f53d8a
docker: add missing vulkan library to base layer and update to 24.04 …
rare-magma Jan 26, 2025
178a7eb
metal : use residency sets (#11427)
ggerganov Jan 26, 2025
caf773f
docker : fix ARM build and Vulkan build (#11434)
ngxson Jan 26, 2025
acd38ef
metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441)
Jan 27, 2025
df984e0
llama: refactor llama_decode_impl (#11381)
JohannesGaessler Jan 27, 2025
a5203b4
llama : minor fixes for up llama load model speed (#11448)
lexasub Jan 27, 2025
d6d24cd
AMD: parse the architecture as supplied by gcnArchName (#11244)
Haus1 Jan 27, 2025
a4417dd
Add new hf protocol for ollama (#11449)
ericcurtin Jan 27, 2025
2b8525d
Handle missing model in CLI parameters for llama-run (#11399)
engelmi Jan 28, 2025
6e84b0a
SYCL : SOFTMAX F16 mask support and other fixes (#11261)
qnixsynapse Jan 28, 2025
f643120
docker: add perplexity and bench commands to full image (#11438)
rare-magma Jan 28, 2025
4bf3119
cmake : don't fail on `GGML_CPU=OFF` (#11457)
someone13574 Jan 28, 2025
d7d1ecc
docker: allow installing pip packages system-wide (#11437)
rare-magma Jan 28, 2025
7fee288
Add github protocol pulling and http:// (#11465)
ericcurtin Jan 28, 2025
cae9fb4
HIP: Only call rocblas_initialize on rocblas versions with the multip…
sARY77 Jan 28, 2025
be5ef79
HIP: Supress transformation warning in softmax.cu
IMbackK Jan 28, 2025
d0c0804
ci : fix build CPU arm64 (#11472)
ngxson Jan 28, 2025
cf8cc85
server : Fixed wrong function name in llamacpp server unit test (#11473)
peidaqi Jan 28, 2025
794fe23
cmake: add hints for locating ggml on Windows using Llama find-packag…
Emreerdog Jan 28, 2025
325afb3
llama: fix missing k_cache store for rwkv6qwen2 (#11445)
MollySophia Jan 29, 2025
b636228
embedding : enable --no-warmup option (#11475)
danbev Jan 29, 2025
d2e518e
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. …
issixx Jan 17, 2025
1a0e87d
ggml : add option to not print stack on abort (ggml/1081)
WilliamTambellini Jan 23, 2025
8158577
sync : ggml
ggerganov Jan 29, 2025
f0d4b29
Parse https://ollama.com/library/ syntax (#11480)
ericcurtin Jan 29, 2025
2711d02
vulkan: Catch pipeline creation failure and print an error message (#…
jeffbolznv Jan 29, 2025
e51c47b
server : update auto gen files comments [no ci] (#11484)
danbev Jan 29, 2025
66ee4f2
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)
remyoudompheng Jan 29, 2025
eb7cf15
server : add /apply-template endpoint for additional use cases of Min…
pnb Jan 29, 2025
e044976
server : update json snippets in README.md [no ci] (#11492)
danbev Jan 30, 2025
7919256
readme : reference examples relative links (#11505)
guspan-tanadi Jan 30, 2025
496e5bf
server : (docs) added response format for /apply-template [no ci] (#1…
isaac-mcfadyen Jan 30, 2025
4314e56
server : use lambda instead of std::bind (#11507)
danbev Jan 30, 2025
ffd0821
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#1…
mgroeber9110 Jan 30, 2025
3d804de
sync: minja (#11499)
ochafik Jan 30, 2025
c300e68
CUDA/HIP: add warp_size to cuda_device_info
IMbackK Jan 29, 2025
6af1ca4
HIP: Prepare reduction operators for wave 64
IMbackK Jan 29, 2025
27d135c
HIP: require at least HIP 5.5
IMbackK Jan 29, 2025
8b576b6
Tool call support (generic + native for Llama, Functionary, Hermes, M…
ochafik Jan 30, 2025
553f1e4
`ci`: ccache for all github worfklows (#11516)
ochafik Jan 30, 2025
a2df278
server : update help metrics processing/deferred (#11512)
danbev Jan 31, 2025
1bd3047
common: Add missing va_end (#11529)
stevegrubb Jan 31, 2025
4a2b196
server : fix --jinja when there's no tools or schema (typo was forcin…
ochafik Jan 31, 2025
5783575
Fix chatml fallback for unsupported builtin templates (when --jinja n…
ochafik Jan 31, 2025
b1bcd30
fix stop regression (#11543)
ochafik Jan 31, 2025
a83f528
`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic…
ochafik Jan 31, 2025
aa6fb13
`ci`: use sccache on windows instead of ccache (#11545)
ochafik Jan 31, 2025
5bbc736
ci: simplify cmake build commands (#11548)
ochafik Feb 1, 2025
ecef206
Implement s3:// protocol (#11511)
ericcurtin Feb 1, 2025
cfd74c8
`sync`: minja (https://github.com/google/minja/commit/418a2364b56dc9b…
ochafik Feb 1, 2025
53debe6
ci: use sccache on windows HIP jobs (#11553)
ochafik Feb 1, 2025
0cec062
llama : add support for GLM-Edge and GLM-Edge-V series models (#10573)
piDack Feb 2, 2025
ff22770
sampling : support for llguidance grammars (#10224)
mmoskal Feb 2, 2025
6980448
Fix exotic ci env that lacks ostringstream::str (#11581)
ochafik Feb 2, 2025
bfcce4d
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…
ochafik Feb 2, 2025
84ec8a5
Name colors (#11573)
ericcurtin Feb 2, 2025
864a0b6
CUDA: use mma PTX instructions for FlashAttention (#11583)
JohannesGaessler Feb 2, 2025
90f9b88
nit: more informative crash when grammar sampler fails (#11593)
ochafik Feb 2, 2025
4d0598e
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectu…
IMbackK Feb 2, 2025
396856b
CUDA/HIP: add support for selectable warp size to mmv (#11519)
IMbackK Feb 2, 2025
6eecde3
HIP: fix flash_attn_stream_k_fixup warning (#11604)
JohannesGaessler Feb 2, 2025
d92cb67
server : (webui) Fix Shift+Enter handling (#11609)
mashdragon Feb 3, 2025
21c84b5
CUDA: fix Volta FlashAttention logic (#11615)
JohannesGaessler Feb 3, 2025
8ec0583
sync : ggml
ggerganov Feb 3, 2025
5598f47
server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622)
danbev Feb 3, 2025
1d1e6a9
server : (webui) allow typing and submitting during llm response (#11…
woof-dog Feb 3, 2025
b345178
server : (webui) revert hacky solution from #11626 (#11634)
ngxson Feb 3, 2025
cde3833
`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to …
ochafik Feb 3, 2025
b34aedd
ci : do not stale-close roadmap issues
ggerganov Feb 4, 2025
8f8290a
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
ckastner Feb 3, 2025
7c9e0ca
sync : ggml
ggerganov Feb 4, 2025
387a159
authors : update
ggerganov Feb 4, 2025
534c46b
metal : use residency set for other platforms (#11648)
jhen0409 Feb 4, 2025
f117d84
swift : fix llama-vocab api usage (#11645)
jhen0409 Feb 4, 2025
106045e
readme : add llm_client Rust crate to readme bindings (#11628)
ShelbyJenkins Feb 4, 2025
db288b6
`tool-call`: command r7b fix for normal responses (#11608)
ochafik Feb 4, 2025
1bef571
arg : list RPC devices first when using --list-devices (#11655)
rgerganov Feb 4, 2025
3962fc1
server : add try..catch to places not covered by set_exception_handle…
ngxson Feb 4, 2025
3ec9fd4
HIP: force max threads per block to be 1024 (#11621)
fxzjshm Feb 4, 2025
fd08255
CUDA: non-contiguous (RMS) norm support (#11659)
JohannesGaessler Feb 4, 2025
9f4cc8f
`sync`: minja (#11641)
ochafik Feb 5, 2025
1ec2080
llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)
samkoesnadi Feb 5, 2025
fa62da9
CUDA: support for mat. mul. with ne03 != ne13 (#11656)
JohannesGaessler Feb 5, 2025
d774ab3
metal : adjust support conditions for norm operators (#11671)
ggerganov Feb 5, 2025
c3db048
readme : add link to Autopen under UIs (#11684)
blackhole89 Feb 6, 2025
902368a
metal : avoid breaking build when metal API predates TARGET_OS_VISION…
charles-dyfis-net Feb 6, 2025
1b598b3
vulkan: use smaller combined allocations to avoid fragmentation (#11551)
jeffbolznv Feb 6, 2025
8a7e3bf
vulkan: initial support for IQ4_XS quantization (#11501)
remyoudompheng Feb 6, 2025
2c6c8df
vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521)
jeffbolznv Feb 6, 2025
8d4d2be
ggml : fix LoongArch compile error with 128-bit SIMD (#11701)
junchao-loongson Feb 6, 2025
c0d4843
build : fix llama.pc (#11658)
angt Feb 6, 2025
9dd7a03
llama : add log about loading model tensors (#11699)
ggerganov Feb 6, 2025
194b2e6
SYCL: Adjust support condition for norm operators (#11674)
qnixsynapse Feb 6, 2025
9ab42dc
docs: update fedora cuda guide for 12.8 release (#11393)
teihome Feb 6, 2025
2fb3c32
server : (webui) migrate project to ReactJS with typescript (#11688)
ngxson Feb 6, 2025
1d20e53
rpc: fix known RCE in rpc-server (ggml/1103)
retr0reg Feb 6, 2025
8a59053
sync : ggml
ggerganov Feb 6, 2025
855cd07
llama : fix old glm4 models (#11670)
tv1wnd Feb 6, 2025
225bbbf
ggml : optimize and build warning fix for LoongArch (#11709)
MQ-mengqing Feb 7, 2025
b7552cf
common : add default embeddings presets (#11677)
danbev Feb 7, 2025
ec3bc82
SYCL: remove XMX info from print devices (#11712)
qnixsynapse Feb 7, 2025
7ee953a
llama : add llama_sampler_init for safe usage of llama_sampler_free (…
cfillion Feb 7, 2025
c026ba3
vulkan: print shared memory size (#11719)
jeffbolznv Feb 7, 2025
333820d
llama : fix progress dots (#11730)
magicse Feb 7, 2025
2d219b3
vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729)
cfillion Feb 7, 2025
ed926d8
llama : fix defrag logic (#11707)
ggerganov Feb 7, 2025
d2fe216
Make logging more verbose (#11714)
ericcurtin Feb 7, 2025
0cf8671
server : (webui) fix numeric settings being saved as string (#11739)
ngxson Feb 8, 2025
3ab410f
readme : update front-end framework (#11753)
pothitos Feb 8, 2025
d80be89
CUDA: fix min. version for movmatrix (#11751)
JohannesGaessler Feb 8, 2025
4d3465c
ggml: Fix data race in ggml threadpool (#11736)
kkontny Feb 8, 2025
bdcf8b6
cont : fix mmap flag print (#11699)
ggerganov Feb 8, 2025
aaa5505
server : minor log updates (#11760)
ggerganov Feb 8, 2025
e6e6583
server : (webui) increase edit textarea size (#11763)
woof-dog Feb 8, 2025
55ac8c7
server : (webui) revamp Settings dialog, add Pyodide interpreter (#11…
ngxson Feb 8, 2025
98f6b0f
vulkan: account for lookup tables when checking shared memory size (#…
jeffbolznv Feb 9, 2025
19d3c82
There's a better way of clearing lines (#11756)
ericcurtin Feb 9, 2025
b044a0f
vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid …
wbruna Feb 10, 2025
c2a67ef
vulkan: Make Vulkan optional at runtime (#11493). (#11494)
daym Feb 10, 2025
9ac3457
Update README.md [no ci] (#11781)
pascal-lc Feb 10, 2025
d7b31a9
sync: minja (https://github.com/google/minja/commit/a72057e5190de2c61…
ochafik Feb 10, 2025
0893e01
server : correct signal handler (#11795)
ngxson Feb 10, 2025
19b392d
llama-mmap: fix missing include (#11796)
wgottwalt Feb 10, 2025
507f917
server : (webui) introduce conversation branching + idb storage (#11792)
ngxson Feb 10, 2025
8173261
docs: utilize the forward slash (/) as the path separator for Unix-li…
MambaWong Feb 10, 2025
7b891bd
fix: typos in documentation files (#11791)
maximevtush Feb 10, 2025
b9ab0a4
CUDA: use arch list for compatibility check (#11775)
JohannesGaessler Feb 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,25 @@ ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION AS build

ARG TARGETARCH

ARG GGML_CPU_ARM_ARCH=armv8-a

RUN apt-get update && \
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN cmake -S . -B build -DGGML_BACKEND_DL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_CURL=ON -DCMAKE_BUILD_TYPE=Release && \
RUN if [ "$TARGETARCH" = "amd64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
elif [ "$TARGETARCH" = "arm64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
else \
echo "Unsupported architecture"; \
exit 1; \
fi && \
cmake --build build -j $(nproc)

RUN mkdir -p /app/lib && \
Expand Down
10 changes: 9 additions & 1 deletion .devops/tools.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ elif [[ "$arg1" == '--quantize' || "$arg1" == '-q' ]]; then
exec ./llama-quantize "$@"
elif [[ "$arg1" == '--run' || "$arg1" == '-r' ]]; then
exec ./llama-cli "$@"
elif [[ "$arg1" == '--bench' || "$arg1" == '-b' ]]; then
exec ./llama-bench "$@"
elif [[ "$arg1" == '--perplexity' || "$arg1" == '-p' ]]; then
exec ./llama-perplexity "$@"
elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then
echo "Converting PTH to GGML..."
for i in `ls $1/$2/ggml-model-f16.bin*`; do
for i in $(ls $1/$2/ggml-model-f16.bin*); do
if [ -f "${i/f16/q4_0}" ]; then
echo "Skip model quantization, it already exists: ${i/f16/q4_0}"
else
Expand All @@ -30,6 +34,10 @@ else
echo "Available commands: "
echo " --run (-r): Run a model previously converted into ggml"
echo " ex: -m /models/7B/ggml-model-q4_0.bin -p \"Building a website can be done in 10 simple steps:\" -n 512"
echo " --bench (-b): Benchmark the performance of the inference for various parameters."
echo " ex: -m model.gguf"
echo " --perplexity (-p): Measure the perplexity of a model over a given text."
echo " ex: -m model.gguf -f file.txt"
echo " --convert (-c): Convert a llama model into ggml"
echo " ex: --outtype f16 \"/models/7B/\" "
echo " --quantize (-q): Optimize with quantization process ggml"
Expand Down
11 changes: 6 additions & 5 deletions .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG UBUNTU_VERSION=jammy
ARG UBUNTU_VERSION=24.04

FROM ubuntu:$UBUNTU_VERSION AS build

Expand All @@ -7,7 +7,7 @@ RUN apt update && apt install -y git build-essential cmake wget

# Install Vulkan SDK and cURL
RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list && \
apt update -y && \
apt-get install -y vulkan-sdk libcurl4-openssl-dev curl

Expand All @@ -34,7 +34,7 @@ RUN mkdir -p /app/full \
FROM ubuntu:$UBUNTU_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt-get install -y libgomp1 curl libvulkan-dev \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand All @@ -55,8 +55,9 @@ RUN apt-get update \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
python3-wheel \
&& pip install --break-system-packages --upgrade setuptools \
&& pip install --break-system-packages -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
Expand Down
8 changes: 8 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,11 @@ indent_style = tab
[examples/cvector-generator/*.txt]
trim_trailing_whitespace = unset
insert_final_newline = unset

[models/templates/*.jinja]
indent_style = unset
indent_size = unset
end_of_line = unset
charset = unset
trim_trailing_whitespace = unset
insert_final_newline = unset
Loading
Loading