Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
bf9c101
metal : use F32 prec for K*Q in vec FA (#9595)
ggerganov Sep 23, 2024
37f8c7b
perplexity : remove extra new lines after chunks (#9596)
ggerganov Sep 23, 2024
1e7b929
ggml : AVX512 gemm for Q4_0_8_8 (#9532)
Srihari-mcw Sep 23, 2024
f3979df
flake.lock: Update (#9586)
ggerganov Sep 23, 2024
1d48e98
readme : add programmable prompt engine language CLI (#9599)
snowyu Sep 23, 2024
f0c7b5e
threads: improve ggml_barrier scaling with large number of threads (#…
max-krasnyansky Sep 23, 2024
0b3bf96
server : add --no-context-shift option (#9607)
ngxson Sep 23, 2024
116efee
cuda: add q8_0->f32 cpy operation (#9571)
Nekotekina Sep 24, 2024
c087b6f
threads: fix msvc build without openmp (#9615)
max-krasnyansky Sep 24, 2024
b0f2736
sampling : avoid expensive softmax during greedy sampling (#9605)
ggerganov Sep 24, 2024
0aa1501
server : add newline after chat example (#9616)
StrangeBytesDev Sep 24, 2024
cea1486
log : add CONT level for continuing previous log entry (#9610)
ggerganov Sep 24, 2024
31ac583
llama : keep track of all EOG tokens in the vocab (#9609)
ggerganov Sep 24, 2024
c038931
examples : adapt to ggml.h changes (ggml/0)
ggerganov Sep 20, 2024
bb5f819
sync : ggml
ggerganov Sep 24, 2024
70392f1
ggml : add AVX512DQ requirement for AVX512 builds (#9622)
EZForever Sep 24, 2024
904837e
cann: fix crash when llama-bench is running on multiple cann devices …
bachelor-dou Sep 25, 2024
3d6bf69
llama : add IBM Granite MoE architecture (#9438)
gabe-l-hart Sep 25, 2024
afbbfaa
server : add more env vars, improve gen-docs (#9635)
ngxson Sep 25, 2024
1e43630
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217)
chaxu01 Sep 25, 2024
ea9c32b
ci : fix docker build number and tag name (#9638)
ngxson Sep 25, 2024
7691654
mtgpu: enable VMM (#9597)
yeahdongcn Sep 26, 2024
95bc82f
[SYCL] add missed dll file in package (#9577)
NeoZhangJianyu Sep 26, 2024
44f59b4
cmake : add option for common library (#9661)
iboB Sep 27, 2024
b5de3b7
readme : update hot topics
ggerganov Sep 27, 2024
89f9944
Enable use to the rebar feature to upload buffers to the device. (#9251)
mtavenrath Sep 28, 2024
6a0f779
ggml : add run-time detection of neon, i8mm and sve (#9331)
eddnjjn Sep 28, 2024
43bcdd9
readme : add tool (#9655)
akx Sep 28, 2024
9a91311
llama : add support for Chameleon (#8543)
nopperl Sep 28, 2024
6102037
vocab : refactor tokenizer to reduce init overhead (#9449)
kylo5aby Sep 28, 2024
7398427
llama : add comment about thread-safety [no ci] (#9449)
ggerganov Sep 28, 2024
1b2f992
test-backend-ops : use flops for some performance tests (#9657)
slaren Sep 28, 2024
f4d2b88
llama : add reranking support (#9510)
ggerganov Sep 28, 2024
589b48d
contrib : add Resources section (#9675)
ggerganov Sep 29, 2024
f99d3f8
py : add model class for Chameleon conversion (#9683)
nopperl Sep 29, 2024
faac0ba
common : ensure llama_batch size does not exceed max size (#9668)
matiaslin Sep 29, 2024
6084bfb
ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969)
ggerganov Sep 24, 2024
544f409
vulkan : argsort barriers must be under uniform control flow (ggml/951)
smeso Sep 26, 2024
0de8b20
vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml…
jeffbolznv Sep 27, 2024
641002f
vulkan : multithread pipeline creation (ggml/963)
jeffbolznv Sep 29, 2024
aaa4099
CUDA: remove bad assert (ggml/972)
JohannesGaessler Sep 29, 2024
d0b1d66
sync : ggml
ggerganov Sep 29, 2024
c919d5d
ggml : define missing HWCAP flags (#9684)
ggerganov Sep 29, 2024
8277a81
console : utf-8 fix for windows stdin (#9690)
hasaranga Sep 30, 2024
ace4f4b
flake.lock: Update (#9680)
ggerganov Sep 30, 2024
08a43d0
py : update transfomers version (#9694)
Vaibhavs10 Sep 30, 2024
511636d
ci : reduce severity of unused Pyright ignore comments (#9697)
compilade Sep 30, 2024
6f1d9d7
Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9…
serhii-nakon Sep 30, 2024
1927378
convert : refactor rope_freqs generation (#9396)
compilade Oct 1, 2024
a90484c
llama : print correct model type for Llama 3.2 1B and 3B
ggerganov Oct 1, 2024
cad341d
metal : reduce command encoding overhead (#9698)
ggerganov Oct 1, 2024
7254cdf
ggml: fix gradient allocation logic (ggml/966)
JohannesGaessler Sep 29, 2024
6c53224
ggml : fix ggml_cast (ggml/973)
iboB Sep 30, 2024
cb00020
vulkan : mul_mat: fix UB with small warps (ggml/952)
smeso Sep 30, 2024
e98c1c1
test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974)
JohannesGaessler Sep 30, 2024
f1b8c42
sync : ggml
ggerganov Oct 1, 2024
3f1ae2e
Update README.md (#9591)
32bitmicro Oct 1, 2024
148844f
examples : remove benchmark (#9704)
ggerganov Oct 2, 2024
76b37d1
gguf-split : improve --split and --merge logic (#9619)
kylo5aby Oct 2, 2024
00b7317
vulkan : do not use tensor->extra (#9407)
rgerganov Oct 2, 2024
f536f4c
[SYCL] Initial cmake support of SYCL for AMD GPUs (#9658)
Alcpz Oct 2, 2024
a39ab21
llama : reduce compile time and binary size (#9712)
ngxson Oct 2, 2024
c83ad6d
ggml-backend : add device and backend reg interfaces (#9707)
slaren Oct 2, 2024
5639971
Fixed dequant precision issues in Q4_1 and Q5_1 (#9711)
OuadiElfarouki Oct 3, 2024
841713e
rpc : enable vulkan (#9714)
rgerganov Oct 3, 2024
e3c355b
convert : handle tokenizer merges format from transformers 4.45 (#9696)
compilade Oct 3, 2024
d6fe7ab
ggml: unify backend logging mechanism (#9709)
bandoti Oct 3, 2024
a7ad553
ggml-backend : add device description to CPU backend (#9720)
slaren Oct 3, 2024
5d5ab1e
metal : fix compute pass descriptor autorelease crash (#9718)
jmousseau Oct 3, 2024
eee39bd
ggml: refactor cross entropy loss CPU impl. (ggml/976)
JohannesGaessler Oct 2, 2024
fabdc3b
ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)
JohannesGaessler Oct 3, 2024
1bb8a64
sync : ggml
ggerganov Oct 3, 2024
d5ed2b9
metal : remove abort (skip) (ggml/0)
ggerganov Oct 3, 2024
133c7b4
Fixed RNG seed docs (#9723)
d-kleine Oct 4, 2024
f3fdcfa
ci : fine-grant permission (#9710)
ngxson Oct 4, 2024
ff56576
ggml : fixes after sync (ggml/983)
slaren Oct 4, 2024
55951c0
ggml : fix typo in example usage ggml_gallocr_new (ggml/984)
danbev Oct 4, 2024
1788077
sync : ggml
ggerganov Oct 4, 2024
71967c2
Add Llama Assistant (#9744)
vietanhdev Oct 4, 2024
905f548
metal : zero-init buffer contexts (whisper/0)
ggerganov Oct 5, 2024
58b1669
sync : ggml
ggerganov Oct 5, 2024
8c475b9
rerank : use [SEP] token instead of [BOS] (#9737)
ggerganov Oct 5, 2024
b0915d5
vulkan : retry allocation with fallback flags (whisper/2451)
SRHMorris Oct 6, 2024
b6d6c52
sync : llama.cpp
ggerganov Oct 6, 2024
f4b2dcd
readme : fix typo [no ci]
ggerganov Oct 6, 2024
d5cb868
contrib : simplify + minor edits [no ci]
ggerganov Oct 6, 2024
96b6912
metal : single allocation of encode_async block (#9747)
ptsochantaris Oct 7, 2024
d5ac8cf
ggml : add metal backend registry / device (#9713)
ggerganov Oct 7, 2024
6279dac
flake.lock: Update (#9753)
ggerganov Oct 7, 2024
f1af42f
Update building for Android (#9672)
amqdn Oct 7, 2024
6374743
ggml : add backend registry / device interfaces to BLAS backend (#9752)
slaren Oct 7, 2024
fa42aa6
scripts : fix spelling typo in messages and comments (#9782)
standby24x7 Oct 8, 2024
458367a
server : better security control for public deployments (#9776)
ngxson Oct 8, 2024
dca1d4b
ggml : fix BLAS with unsupported types (#9775)
slaren Oct 8, 2024
3dc48fe
examples : remove llama.vim
ggerganov Oct 9, 2024
e702206
perplexity : fix integer overflow (#9783)
ggerganov Oct 9, 2024
c81f3bb
cmake : do not build common library by default when standalone (#9804)
slaren Oct 9, 2024
c7499c5
examples : do not use common library in simple example (#9803)
slaren Oct 10, 2024
cf8e0a3
musa: add docker image support (#9685)
yeahdongcn Oct 10, 2024
0e9f760
rpc : add backend registry / device interfaces (#9812)
slaren Oct 10, 2024
7eee341
common : use common_ prefix for common library functions (#9805)
slaren Oct 10, 2024
9677640
ggml : move more prints to the ggml log system (#9839)
slaren Oct 11, 2024
943d20b
musa : update doc (#9856)
yeahdongcn Oct 12, 2024
11ac980
llama : improve infill support and special token detection (#9798)
ggerganov Oct 12, 2024
95c76e8
server : remove legacy system_prompt feature (#9857)
ggerganov Oct 12, 2024
1bde94d
server : remove self-extend features (#9860)
ggerganov Oct 12, 2024
edc2656
server : add option to time limit the generation phase (#9865)
ggerganov Oct 12, 2024
92be9f1
flake.lock: Update (#9870)
ggerganov Oct 13, 2024
c7181bd
server : reuse cached context chunks (#9866)
ggerganov Oct 13, 2024
d4c19c0
server : accept extra_context for the infill endpoint (#9874)
ggerganov Oct 13, 2024
13dca2a
Vectorize load instructions in dmmv f16 CUDA kernel (#9816)
agray3 Oct 14, 2024
a89f75e
server : handle "logprobs" field with false value (#9871)
VoidIsVoid Oct 14, 2024
4c42f93
readme : update bindings list (#9889)
srgtuszy Oct 15, 2024
dcdd535
server : update preact (#9895)
ggerganov Oct 15, 2024
fbc98b7
sampling : add XTC sampler (#9742)
MaggotHATE Oct 15, 2024
223c25a
server : improve infill context reuse (#9894)
ggerganov Oct 15, 2024
755a9b2
llama : add infill sampler (#9896)
ggerganov Oct 15, 2024
becfd38
[CANN] Fix cann compilation error (#9891)
leo-pony Oct 16, 2024
cd60b88
ggml-alloc : remove buffer_id from leaf_alloc (ggml/987)
danbev Oct 9, 2024
0e41b30
sync : ggml
ggerganov Oct 16, 2024
1f66b69
server : fix the disappearance of the end of the text (#9867)
z80maniac Oct 16, 2024
10433e8
llama : add tensor name for "result_norm" (#9907)
MollySophia Oct 16, 2024
66c2c93
grammar : fix JSON Schema for string regex with top-level alt. (#9903)
jemc Oct 16, 2024
dbf18e4
llava : fix typo in error message [no ci] (#9884)
danbev Oct 16, 2024
9e04102
llama : suppress conversion from 'size_t' to 'int' (#9046)
danbev Oct 16, 2024
73afe68
fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)
giladgd Oct 16, 2024
2194200
fix: allocating CPU buffer with size `0` (#9917)
giladgd Oct 16, 2024
f010b77
vulkan : add backend registry / device interfaces (#9721)
slaren Oct 17, 2024
3752217
readme : update bindings list (#9918)
ShenghaiWang Oct 17, 2024
99bd4ac
llama : infill sampling handle very long tokens (#9924)
ggerganov Oct 17, 2024
9f45fc1
llama : change warning to debug log
ggerganov Oct 17, 2024
17bb928
readme : remove --memory-f32 references (#9925)
ggerganov Oct 17, 2024
6f55bcc
llama : rename batch_all to batch (#8881)
danbev Oct 17, 2024
8901755
server : add n_indent parameter for line indentation requirement (#9929)
ggerganov Oct 18, 2024
60ce97c
add amx kernel for gemm (#8998)
mingfeima Oct 18, 2024
87421a2
[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
OuadiElfarouki Oct 18, 2024
afd9909
rpc : backend refactoring (#9912)
rgerganov Oct 18, 2024
cda0e4b
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
ngxson Oct 18, 2024
7cab208
readme : update infra list (#9942)
icppWorld Oct 20, 2024
45f0976
readme : update bindings list (#9951)
lcarrere Oct 20, 2024
1db8c84
fix mul_mat_vec_q and *_vec_q error (#9939)
NeoZhangJianyu Oct 21, 2024
bc21975
speculative : fix handling of some input params (#9963)
ggerganov Oct 21, 2024
55e4778
llama : default sampling changes + greedy update (#9897)
ggerganov Oct 21, 2024
d5ebd79
rpc : pack only RPC structs (#9959)
rgerganov Oct 21, 2024
f594bc8
ggml : add asserts for type conversion in fattn kernels (#9971)
ggerganov Oct 21, 2024
dbd5f2f
llama.vim : plugin for Neovim (#9787)
ggerganov Oct 21, 2024
94008cc
arg : fix attention non-causal arg value hint (#9985)
danbev Oct 21, 2024
994cfb1
readme : update UI list (#9972)
a-ghorbani Oct 21, 2024
e01c67a
llama.vim : move info to the right of screen [no ci] (#9787)
ggerganov Oct 21, 2024
e94a138
llama.vim : fix info text display [no ci] (#9787)
ggerganov Oct 21, 2024
674804a
arg : fix typo in embeddings argument help [no ci] (#9994)
danbev Oct 22, 2024
6b84473
[CANN] Adapt to dynamically loadable backends mechanism (#9970)
leo-pony Oct 22, 2024
4ff7fe1
llama : add chat template for RWKV-World + fix EOT (#9968)
MollySophia Oct 22, 2024
c421ac0
lora : warn user if new token is added in the adapter (#9948)
ngxson Oct 22, 2024
11d4705
Rwkv chat template fix (#10001)
MollySophia Oct 22, 2024
19d900a
llama : rename batch to ubatch (#9950)
danbev Oct 22, 2024
c8c07d6
llama : fix empty batch causing llama_batch_allocr to crash (#9966)
ngxson Oct 22, 2024
873279b
flake.lock: Update
github-actions[bot] Oct 20, 2024
4c9388f
metal : add POOL2D and fix IM2COL (#9943)
junhee-yoo Oct 23, 2024
ac113a0
llama.vim : add classic vim support (#9995)
m18coppola Oct 23, 2024
c19af0a
ggml : remove redundant set of contexts used field (ggml/978)
danbev Oct 16, 2024
80273a3
CUDA: fix 1D im2col, add tests (ggml/993)
JohannesGaessler Oct 18, 2024
2d3aba9
llama.vim : bump generation time limit to 3s [no ci]
ggerganov Oct 23, 2024
190a37d
sync : ggml
ggerganov Oct 23, 2024
0a1c750
server : samplers accept the prompt correctly (#10019)
wwoodsTM Oct 23, 2024
c39665f
CUDA: fix MMQ for non-contiguous src0, add tests (#10021)
JohannesGaessler Oct 24, 2024
167a515
CUDA: fix insufficient buffer clearing for MMQ (#10032)
JohannesGaessler Oct 24, 2024
40f2555
ci : fix cmake flags for SYCL
ggerganov Oct 24, 2024
958367b
server : refactor slot input data, move tokenizer to HTTP thread (#10…
ngxson Oct 24, 2024
bc5ba00
server : check that the prompt fits in the slot's context (#10030)
ggerganov Oct 25, 2024
2f8bd2b
llamafile : extend sgemm.cpp support for Q5_0 models (#10010)
Srihari-mcw Oct 25, 2024
d80fb71
llama: string_split fix (#10022)
Xarbirus Oct 25, 2024
ff252ea
llama : add DRY sampler (#9702)
wwoodsTM Oct 25, 2024
6687503
metal : support permuted matrix multiplicaions (#10033)
ggerganov Oct 25, 2024
9e4a256
scripts : fix amx sync [no ci]
ggerganov Oct 26, 2024
8c60a8a
increase cuda_cpy block size (ggml/996)
bssrdf Oct 23, 2024
cc2983d
sync : ggml
ggerganov Oct 26, 2024
8841ce3
llama : switch KQ multiplication to F32 precision by default (#10015)
ggerganov Oct 27, 2024
8125e6c
server : don't overfill the batch during infill (#10018)
ggerganov Oct 28, 2024
524afee
musa: workaround for Guilty Lockup in cleaning src0 (#10042)
yeahdongcn Oct 28, 2024
07028f9
flake.lock: Update (#10063)
ggerganov Oct 28, 2024
46dcd2b
PPC MMA implementation
amritahs-ibm Sep 12, 2024
4a9a3c8
Merge branch 'ggerganov:master' into sgemm_ppc_mma
amritahs-ibm Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .devops/full-musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc3.1.0
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

RUN apt-get update && \
apt-get install -y build-essential cmake python3 python3-pip git libcurl4-openssl-dev libgomp1

COPY requirements.txt requirements.txt
COPY requirements requirements

RUN pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt

WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc) && \
cp build/bin/* .

ENTRYPOINT ["/app/.devops/tools.sh"]
6 changes: 3 additions & 3 deletions .devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
ARG ROCM_DOCKER_ARCH=\
ARG ROCM_DOCKER_ARCH="\
gfx803 \
gfx900 \
gfx906 \
Expand All @@ -21,7 +21,7 @@ ARG ROCM_DOCKER_ARCH=\
gfx1030 \
gfx1100 \
gfx1101 \
gfx1102
gfx1102"

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -34,7 +34,7 @@ WORKDIR /app
COPY . .

# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
Expand Down
30 changes: 30 additions & 0 deletions .devops/llama-cli-musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc3.1.0
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
# Target the MUSA runtime image
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

RUN apt-get update && \
apt-get install -y build-essential git cmake

WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_MUSA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-cli -j$(nproc)

FROM ${BASE_MUSA_RUN_CONTAINER} AS runtime

RUN apt-get update && \
apt-get install -y libgomp1

COPY --from=build /app/build/ggml/src/libggml.so /libggml.so
COPY --from=build /app/build/src/libllama.so /libllama.so
COPY --from=build /app/build/bin/llama-cli /llama-cli

ENTRYPOINT [ "/llama-cli" ]
6 changes: 3 additions & 3 deletions .devops/llama-cli-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
ARG ROCM_DOCKER_ARCH=\
ARG ROCM_DOCKER_ARCH="\
gfx803 \
gfx900 \
gfx906 \
Expand All @@ -21,7 +21,7 @@ ARG ROCM_DOCKER_ARCH=\
gfx1030 \
gfx1100 \
gfx1101 \
gfx1102
gfx1102"

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -34,7 +34,7 @@ WORKDIR /app
COPY . .

# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
Expand Down
35 changes: 35 additions & 0 deletions .devops/llama-server-musa.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG MUSA_VERSION=rc3.1.0
# Target the MUSA build image
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
# Target the MUSA runtime image
ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

RUN apt-get update && \
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-server -j$(nproc)

FROM ${BASE_MUSA_RUN_CONTAINER} AS runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/build/ggml/src/libggml.so /libggml.so
COPY --from=build /app/build/src/libllama.so /libllama.so
COPY --from=build /app/build/bin/llama-server /llama-server

# Must be set to 0.0.0.0 so it can listen to requests from host machine
ENV LLAMA_ARG_HOST=0.0.0.0

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
6 changes: 3 additions & 3 deletions .devops/llama-server-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
ARG ROCM_DOCKER_ARCH=\
ARG ROCM_DOCKER_ARCH="\
gfx803 \
gfx900 \
gfx906 \
Expand All @@ -21,7 +21,7 @@ ARG ROCM_DOCKER_ARCH=\
gfx1030 \
gfx1100 \
gfx1101 \
gfx1102
gfx1102"

COPY requirements.txt requirements.txt
COPY requirements requirements
Expand All @@ -34,7 +34,7 @@ WORKDIR /app
COPY . .

# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
Expand Down
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
*.o
*.a
.cache/
.git/
# Do not ignore .git directory, otherwise the reported build number will always be 0
.github/
.gitignore
.vs/
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/bench.yml.disabled
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,10 @@ on:
push:
branches:
- master
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.c', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
pull_request_target:
types: [opened, synchronize, reopened]
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.c', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
paths: ['llama.cpp', 'ggml.c', 'ggml-backend.cpp', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
schedule:
- cron: '04 2 * * *'

Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

# Fine-grant permission
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
contents: write # for creating release

env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
GGML_NLOOP: 3
Expand Down Expand Up @@ -956,6 +961,7 @@ jobs:
cp "${{ env.ONEAPI_ROOT }}/compiler/latest/bin/sycl7.dll" ./build/bin
cp "${{ env.ONEAPI_ROOT }}/compiler/latest/bin/svml_dispmd.dll" ./build/bin
cp "${{ env.ONEAPI_ROOT }}/compiler/latest/bin/libmmd.dll" ./build/bin
cp "${{ env.ONEAPI_ROOT }}/compiler/latest/bin/libiomp5md.dll" ./build/bin
echo "cp oneAPI running time dll files to ./build/bin done"
7z a llama-${{ steps.tag.outputs.name }}-bin-win-sycl-x64.zip ./build/bin/*

Expand Down Expand Up @@ -1031,7 +1037,7 @@ jobs:
run: |
$env:HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
$env:CMAKE_PREFIX_PATH="${env:HIP_PATH}"
cmake -G "Unix Makefiles" -B build -S . -DCMAKE_C_COMPILER="${env:HIP_PATH}\bin\clang.exe" -DCMAKE_CXX_COMPILER="${env:HIP_PATH}\bin\clang++.exe" -DGGML_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS=${{ matrix.gpu_target }} -DGGML_RPC=ON
cmake -G "Unix Makefiles" -B build -S . -DCMAKE_C_COMPILER="${env:HIP_PATH}\bin\clang.exe" -DCMAKE_CXX_COMPILER="${env:HIP_PATH}\bin\clang++.exe" -DGGML_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release -DAMDGPU_TARGETS=${{ matrix.gpu_target }} -DGGML_RPC=ON
cmake --build build -j ${env:NUMBER_OF_PROCESSORS}
md "build\bin\rocblas\library\"
cp "${env:HIP_PATH}\bin\hipblas.dll" "build\bin\"
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/close-issue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ on:
schedule:
- cron: "42 0 * * *"

# Fine-grant permission
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
issues: write

jobs:
close-issues:
runs-on: ubuntu-latest
Expand Down
61 changes: 41 additions & 20 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,17 @@ on:
branches:
- master
paths: ['.github/workflows/docker.yml', '.devops/*.Dockerfile', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal']
workflow_dispatch: # allows manual triggering, useful for debugging

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

# Fine-grant permission
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
packages: write

jobs:
push_to_registry:
name: Push Docker image to Docker Hub
Expand All @@ -37,6 +43,9 @@ jobs:
- { tag: "light-cuda", dockerfile: ".devops/llama-cli-cuda.Dockerfile", platforms: "linux/amd64" }
- { tag: "server-cuda", dockerfile: ".devops/llama-server-cuda.Dockerfile", platforms: "linux/amd64" }
- { tag: "full-cuda", dockerfile: ".devops/full-cuda.Dockerfile", platforms: "linux/amd64" }
- { tag: "light-musa", dockerfile: ".devops/llama-cli-musa.Dockerfile", platforms: "linux/amd64" }
- { tag: "server-musa", dockerfile: ".devops/llama-server-musa.Dockerfile", platforms: "linux/amd64" }
- { tag: "full-musa", dockerfile: ".devops/full-musa.Dockerfile", platforms: "linux/amd64" }
# Note: the rocm images are failing due to a compiler error and are disabled until this is fixed to allow the workflow to complete
#- { tag: "light-rocm", dockerfile: ".devops/llama-cli-rocm.Dockerfile", platforms: "linux/amd64,linux/arm64" }
#- { tag: "server-rocm", dockerfile: ".devops/llama-server-rocm.Dockerfile", platforms: "linux/amd64,linux/arm64" }
Expand All @@ -46,6 +55,8 @@ jobs:
steps:
- name: Check out the repo
uses: actions/checkout@v4
with:
fetch-depth: 0 # preserve git history, so we can determine the build number

- name: Set up QEMU
uses: docker/setup-qemu-action@v2
Expand All @@ -60,6 +71,34 @@ jobs:
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
REPO_OWNER="${GITHUB_REPOSITORY_OWNER@L}" # to lower case
REPO_NAME="${{ github.event.repository.name }}"

# determine tag name postfix (build number, commit hash)
if [[ "${{ env.GITHUB_BRANCH_NAME }}" == "master" ]]; then
TAG_POSTFIX="b${BUILD_NUMBER}"
else
SAFE_NAME=$(echo "${{ env.GITHUB_BRANCH_NAME }}" | tr '/' '-')
TAG_POSTFIX="${SAFE_NAME}-${SHORT_HASH}"
fi

# list all tags possible
TAGS=""
TAGS="${TAGS}ghcr.io/${REPO_OWNER}/${REPO_NAME}:${{ matrix.config.tag }},"
TAGS="${TAGS}ghcr.io/${REPO_OWNER}/${REPO_NAME}:${{ matrix.config.tag }}-${TAG_POSTFIX}"

echo "output_tags=$TAGS" >> $GITHUB_OUTPUT
echo "output_tags=$TAGS" # print out for debugging
env:
GITHUB_BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
GITHUB_REPOSITORY_OWNER: '${{ github.repository_owner }}'

# https://github.com/jlumbroso/free-disk-space/tree/54081f138730dfa15788a46383842cd2f914a1be#example
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
Expand All @@ -77,31 +116,13 @@ jobs:
docker-images: true
swap-storage: true

- name: Determine tag name
id: tag
shell: bash
run: |
BUILD_NUMBER="$(git rev-list --count HEAD)"
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
else
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
fi

- name: Downcase github.repository_owner
run: |
echo "repository_owner_lowercase=${GITHUB_REPOSITORY_OWNER@L}" >> $GITHUB_ENV
env:
GITHUB_REPOSITORY_OWNER: '${{ github.repository_owner }}'

- name: Build and push Docker image (tagged + versioned)
if: github.event_name == 'push'
uses: docker/build-push-action@v6
with:
context: .
push: true
platforms: ${{ matrix.config.platforms }}
tags: "ghcr.io/${{ env.repository_owner_lowercase }}/llama.cpp:${{ matrix.config.tag }}-${{ env.COMMIT_SHA }},ghcr.io/${{ env.repository_owner_lowercase }}/llama.cpp:${{ matrix.config.tag }},ghcr.io/${{ env.repository_owner_lowercase }}/llama.cpp:${{ matrix.config.tag }}-${{ steps.tag.outputs.name }}"
# tag list is generated from step above
tags: ${{ steps.tag.outputs.output_tags }}
file: ${{ matrix.config.dockerfile }}
7 changes: 7 additions & 0 deletions .github/workflows/nix-ci-aarch64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

# Fine-grant permission
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
# https://github.com/DeterminateSystems/nix-installer-action?tab=readme-ov-file#with-flakehub
id-token: write
contents: read

jobs:
nix-build-aarch64:
runs-on: ubuntu-latest
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/nix-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

# Fine-grant permission
# https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
# https://github.com/DeterminateSystems/nix-installer-action?tab=readme-ov-file#with-flakehub
id-token: write
contents: read

jobs:
nix-eval:
strategy:
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/python-type-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ on:
push:
paths:
- '.github/workflows/python-type-check.yml'
- 'pyrightconfig.json'
- '**.py'
- '**/requirements*.txt'
pull_request:
paths:
- '.github/workflows/python-type-check.yml'
- 'pyrightconfig.json'
- '**.py'
- '**/requirements*.txt'

Expand All @@ -33,6 +35,6 @@ jobs:
- name: Type-check with Pyright
uses: jakebailey/pyright-action@v2
with:
version: 1.1.370
version: 1.1.382
level: warning
warnings: true
Loading
Loading