Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
00b02bb
imatrix : fix arg parser for imatrix (#9366)
ngxson Sep 8, 2024
eae5971
llama : sanitize tokens in the upper bound (#9359)
slaren Sep 8, 2024
2a358fb
[SYCL] add check malloc result on device (#9346)
NeoZhangJianyu Sep 8, 2024
19f4a7b
llama : refactor samplers internal implementation (#9370)
slaren Sep 8, 2024
a249843
common : restore --n-gpu-layers (#9371)
slaren Sep 8, 2024
3f7ccfd
common : bring back missing args, add env var duplication check (#9375)
ngxson Sep 8, 2024
e079bff
cuda : fix FA Q src index (1 -> 0) (#9374)
ggerganov Sep 8, 2024
daa9623
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend …
mtavenrath Sep 8, 2024
b2e89a3
Arm AArch64: Documentation updates (#9321)
eddnjjn Sep 9, 2024
54f376d
rpc : update README [no ci] (#9320)
rgerganov Sep 9, 2024
5ed0875
readme : add LLMUnity to UI projects (#9381)
amakropoulos Sep 9, 2024
8e6e2fb
CUDA: fix variable name conflict for Windows build (#9382)
JohannesGaessler Sep 9, 2024
38ca6f6
readme : update hot topics
ggerganov Sep 9, 2024
5fb5e24
llama : minor sampling refactor (2) (#9386)
slaren Sep 9, 2024
5fac4d5
ggml : vector length agnostic SVE support (#9290)
Vithulep Sep 9, 2024
293bebe
rpc : fix segfault with nkvo (#9389)
rgerganov Sep 9, 2024
bfe76d4
common : move arg parser code to `arg.cpp` (#9388)
ngxson Sep 9, 2024
fb3f249
make : do not run llama-gen-docs when building (#9399)
slaren Sep 10, 2024
0b4ac75
RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)
MollySophia Sep 10, 2024
83008b7
llama : update llm_build_copy_mask_state comment [no ci] (#9385)
danbev Sep 10, 2024
00ba2ff
metal : fix compile warning with GGML_METAL_NDEBUG (#0)
ggerganov Sep 10, 2024
49006c6
llama : move random seed generation to the samplers (#9398)
slaren Sep 10, 2024
8d300bd
enable --special arg for llama-server (#9419)
matteoserva Sep 10, 2024
6cd4e03
arg : bring back missing ifdef (#9411)
ngxson Sep 10, 2024
cb9c933
flake.lock: Update (#9360)
ggerganov Sep 10, 2024
51b6038
sycl : update support conditions (#9394)
Alcpz Sep 11, 2024
b34e023
musa: remove Clang builtins mapping (#9421)
yeahdongcn Sep 11, 2024
d2b496b
batched-bench : remove unused code (#9305)
ggerganov Sep 11, 2024
5af118e
CUDA: fix --split-mode row race condition (#9413)
JohannesGaessler Sep 11, 2024
67155ab
feat: Implements retrying logic for downloading models using --model-…
farbodbj Sep 11, 2024
5bb2c5d
files : remove accidentally added `lora_test` submodule (#9430)
ngxson Sep 11, 2024
0996c55
llava : correct args for minicpmv-cli (#9429)
ngxson Sep 11, 2024
8db003a
py : support converting local models (#7547)
EvilFreelancer Sep 11, 2024
1b28061
llama : skip token bounds check when evaluating embeddings (#9437)
slaren Sep 11, 2024
449ccfb
Add Jais to list of supported models (#9439)
fmz Sep 12, 2024
df4b794
cann: Fix error when running a non-exist op (#9424)
bachelor-dou Sep 12, 2024
c9c8575
enhance run script to be easy to change the parameters (#9448)
NeoZhangJianyu Sep 12, 2024
d6a04f8
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)
ggerganov Sep 12, 2024
2b00fa7
riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)
Tameem-10xE Sep 12, 2024
39f852f
py : add special tokens in hf_converter for RWKV v6 (#9428)
MollySophia Sep 12, 2024
ff76e18
cmake : fixed the order of linking libraries for llama-quantize (#9450)
Xarbirus Sep 12, 2024
3c26a16
ci : bump actions/checkout to v4 (#9377)
trivikr Sep 12, 2024
c837981
py : add Phi-1.5/Phi-2 tokenizer (#9361)
daminho Sep 12, 2024
4dc4f5f
ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329)
no1wudi Sep 12, 2024
2a82511
cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338)
Xarbirus Sep 12, 2024
d4c3c10
lora : raise error if lm_head is ignored (#9103)
ngxson Sep 12, 2024
e665744
llava : fix the script error in MobileVLM README (#9054)
fengerhu1 Sep 12, 2024
e6b7801
cann: Add host buffer type for Ascend NPU (#9406)
bachelor-dou Sep 12, 2024
7820364
server : Add option to return token pieces in /tokenize endpoint (#9108)
mathijshenquet Sep 12, 2024
bd35cb0
feat: remove a sampler from a chain (#9445)
giladgd Sep 13, 2024
0abc6a2
llama : llama_perf + option to disable timings during decode (#9355)
ggerganov Sep 13, 2024
feff4aa
server : add loading html page while model is loading (#9468)
ngxson Sep 13, 2024
befaf11
llama : make cell_id const in inp_s_mask block (#9470)
danbev Sep 14, 2024
1f4111e
cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)
ggerganov Sep 14, 2024
dcdcee3
server: add data: [DONE] to /chat/completions stream response (#9459)
VoidIsVoid Sep 14, 2024
822b632
ggml : ggml_type_name return "NONE" for invalid values (#9458)
ykhrustalev Sep 14, 2024
7596487
cmake : try to fix sycl+intel build (#9487)
Xarbirus Sep 15, 2024
d6b37c8
readme : update tools list (#9475)
OLSecret Sep 15, 2024
3c7989f
py : add "LLaMAForCausalLM" conversion support (#9485)
csabakecskemeti Sep 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ jobs:
steps:
- name: Clone
id: checkout
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Dependencies
id: depends
Expand All @@ -401,7 +401,7 @@ jobs:
continue-on-error: true

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: add oneAPI to apt
shell: bash
Expand Down Expand Up @@ -442,7 +442,7 @@ jobs:
continue-on-error: true

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: add oneAPI to apt
shell: bash
Expand Down Expand Up @@ -546,7 +546,7 @@ jobs:
steps:
- name: Clone
id: checkout
uses: actions/checkout@v1
uses: actions/checkout@v4

- name: Dependencies
id: depends
Expand Down Expand Up @@ -576,7 +576,7 @@ jobs:
steps:
- name: Clone
id: checkout
uses: actions/checkout@v1
uses: actions/checkout@v4

- name: Dependencies
id: depends
Expand Down Expand Up @@ -610,7 +610,7 @@ jobs:
steps:
- name: Clone
id: checkout
uses: actions/checkout@v1
uses: actions/checkout@v4

- name: Dependencies
id: depends
Expand Down Expand Up @@ -969,14 +969,14 @@ jobs:
steps:
- name: Clone
id: checkout
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Install
id: depends
run: |
$ErrorActionPreference = "Stop"
write-host "Downloading AMD HIP SDK Installer"
Invoke-WebRequest -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-WinSvr2022-For-HIP.exe" -OutFile "${env:RUNNER_TEMP}\rocm-install.exe"
Invoke-WebRequest -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q3-WinSvr2022-For-HIP.exe" -OutFile "${env:RUNNER_TEMP}\rocm-install.exe"
write-host "Installing AMD HIP SDK"
Start-Process "${env:RUNNER_TEMP}\rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
write-host "Completed AMD HIP SDK installation"
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ jobs:
if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}
run: |
cd examples/server/tests
$env:PYTHONIOENCODING = ":replace"
behave.exe --summary --stop --no-capture --exclude 'issues|wrong_usages|passkey' --tags llama.cpp

- name: Slow tests
Expand Down
8 changes: 7 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,16 @@ set(LLAMA_BIN_INSTALL_DIR ${CMAKE_INSTALL_BINDIR} CACHE PATH "Location o
# determining _precisely_ which defines are necessary for the llama-config
# package.
#
set(GGML_TRANSIENT_DEFINES)
get_target_property(GGML_DIRECTORY ggml SOURCE_DIR)
get_directory_property(GGML_DIR_DEFINES DIRECTORY ${GGML_DIRECTORY} COMPILE_DEFINITIONS)
if (GGML_DIR_DEFINES)
list(APPEND GGML_TRANSIENT_DEFINES ${GGML_DIR_DEFINES})
endif()
get_target_property(GGML_TARGET_DEFINES ggml COMPILE_DEFINITIONS)
set(GGML_TRANSIENT_DEFINES ${GGML_TARGET_DEFINES} ${GGML_DIR_DEFINES})
if (GGML_TARGET_DEFINES)
list(APPEND GGML_TRANSIENT_DEFINES ${GGML_TARGET_DEFINES})
endif()
get_target_property(GGML_LINK_LIBRARIES ggml LINK_LIBRARIES)

set_target_properties(llama PROPERTIES PUBLIC_HEADER ${CMAKE_CURRENT_SOURCE_DIR}/include/llama.h)
Expand Down
17 changes: 14 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,7 @@ endif
# TODO: probably these flags need to be tweaked on some architectures
# feel free to update the Makefile for your architecture and send a pull request or issue

ifndef RISCV
ifndef RISCV_CROSS_COMPILE

ifeq ($(UNAME_M),$(filter $(UNAME_M),x86_64 i686 amd64))
# Use all CPU extensions that are available:
Expand Down Expand Up @@ -514,7 +514,12 @@ ifneq ($(filter loongarch64%,$(UNAME_M)),)
MK_CXXFLAGS += -mlasx
endif

else
ifneq ($(filter riscv64%,$(UNAME_M)),)
MK_CFLAGS += -march=rv64gcv -mabi=lp64d
MK_CXXFLAGS += -march=rv64gcv -mabi=lp64d
endif

else # RISC-V CROSS COMPILATION
MK_CFLAGS += -march=rv64gcv -mabi=lp64d
MK_CXXFLAGS += -march=rv64gcv -mabi=lp64d
endif
Expand Down Expand Up @@ -925,6 +930,7 @@ OBJ_LLAMA = \

OBJ_COMMON = \
common/common.o \
common/arg.o \
common/console.o \
common/ngram-cache.o \
common/sampling.o \
Expand Down Expand Up @@ -1157,6 +1163,11 @@ common/common.o: \
include/llama.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common/arg.o: \
common/arg.cpp \
common/arg.h
$(CXX) $(CXXFLAGS) -c $< -o $@

common/sampling.o: \
common/sampling.cpp \
common/sampling.h \
Expand Down Expand Up @@ -1429,6 +1440,7 @@ llama-server: \
examples/server/system-prompts.js.hpp \
examples/server/prompt-formats.js.hpp \
examples/server/json-schema-to-grammar.mjs.hpp \
examples/server/loading.html.hpp \
common/json.hpp \
common/stb_image.h \
$(OBJ_ALL)
Expand All @@ -1448,7 +1460,6 @@ llama-gen-docs: examples/gen-docs/gen-docs.cpp \
$(OBJ_ALL)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
./llama-gen-docs

libllava.a: examples/llava/llava.cpp \
examples/llava/llava.h \
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)

## Hot topics

- *add hot topics here*
- Huggingface GGUF editor: [discussion](https://github.com/ggerganov/llama.cpp/discussions/9268) | [tool](https://huggingface.co/spaces/CISCai/gguf-editor)

----

Expand Down Expand Up @@ -89,6 +89,7 @@ Typically finetunes of the base models below are supported as well.
- [x] [SmolLM](https://huggingface.co/collections/HuggingFaceTB/smollm-6695016cad7167254ce15966)
- [x] [EXAONE-3.0-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct)
- [x] [FalconMamba Models](https://huggingface.co/collections/tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a)
- [x] [Jais](https://huggingface.co/inceptionai/jais-13b-chat)

(instructions for supporting more models: [HOWTO-add-model.md](./docs/development/HOWTO-add-model.md))

Expand Down Expand Up @@ -163,6 +164,7 @@ Unless otherwise noted these projects are open-source with permissive licensing:
- [AI Sublime Text plugin](https://github.com/yaroslavyaroslav/OpenAI-sublime-text) (MIT)
- [AIKit](https://github.com/sozercan/aikit) (MIT)
- [LARS - The LLM & Advanced Referencing Solution](https://github.com/abgulati/LARS) (AGPL)
- [LLMUnity](https://github.com/undreamai/LLMUnity) (MIT)

*(to have a project listed here, it should clearly state that it depends on `llama.cpp`)*

Expand All @@ -171,6 +173,7 @@ Unless otherwise noted these projects are open-source with permissive licensing:
- [akx/ggify](https://github.com/akx/ggify) – download PyTorch models from HuggingFace Hub and convert them to GGML
- [crashr/gppm](https://github.com/crashr/gppm) – launch llama.cpp instances utilizing NVIDIA Tesla P40 or P100 GPUs with reduced idle power consumption
- [gpustack/gguf-parser](https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) - review/check the GGUF file and estimate the memory usage
- [Styled Lines](https://marketplace.unity.com/packages/tools/generative-ai/styled-lines-llama-cpp-model-292902) (proprietary licensed, async wrapper of inference part for game development in Unity3d with prebuild Mobile and Web platform wrappers and a model example)

**Infrastructure:**

Expand Down
2 changes: 2 additions & 0 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ add_library(${TARGET} STATIC
base64.hpp
common.h
common.cpp
arg.h
arg.cpp
sampling.h
sampling.cpp
console.h
Expand Down
Loading
Loading