Skip to content

Commit ed2cdea

Browse files
authored
Merge pull request #50 from l3utterfly/merge
merge upstream
2 parents 4b5b503 + 6c87864 commit ed2cdea

File tree

254 files changed

+34475
-26811
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

254 files changed

+34475
-26811
lines changed

.github/ISSUE_TEMPLATE/010-bug-compilation.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,12 +65,22 @@ body:
6565
If possible, please do a git bisect and identify the exact commit that introduced the bug.
6666
validations:
6767
required: false
68+
- type: textarea
69+
id: command
70+
attributes:
71+
label: Compile command
72+
description: >
73+
Please provide the exact command you used to compile llama.cpp. For example: `cmake -B ...`.
74+
This will be automatically formatted into code, so no need for backticks.
75+
render: shell
76+
validations:
77+
required: true
6878
- type: textarea
6979
id: logs
7080
attributes:
7181
label: Relevant log output
7282
description: >
73-
Please copy and paste any relevant log output, including the command that you entered and any generated text.
83+
Please copy and paste any relevant log output, including any generated text.
7484
This will be automatically formatted into code, so no need for backticks.
7585
render: shell
7686
validations:

.github/ISSUE_TEMPLATE/019-bug-misc.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,16 @@ body:
5252
- Other (Please specify in the next section)
5353
validations:
5454
required: false
55+
- type: textarea
56+
id: command
57+
attributes:
58+
label: Command line
59+
description: >
60+
Please provide the exact commands you entered, if applicable. For example: `llama-server -m ... -c ...`, `llama-cli -m ...`, etc.
61+
This will be automatically formatted into code, so no need for backticks.
62+
render: shell
63+
validations:
64+
required: false
5565
- type: textarea
5666
id: info
5767
attributes:
@@ -74,7 +84,7 @@ body:
7484
attributes:
7585
label: Relevant log output
7686
description: >
77-
If applicable, please copy and paste any relevant log output, including the command that you entered and any generated text.
87+
If applicable, please copy and paste any relevant log output, including any generated text.
7888
This will be automatically formatted into code, so no need for backticks.
7989
render: shell
8090
validations:

.github/workflows/build.yml

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,7 @@ jobs:
6060
-DLLAMA_CURL=ON \
6161
-DGGML_METAL_USE_BF16=ON \
6262
-DGGML_METAL_EMBED_LIBRARY=ON \
63-
-DGGML_RPC=ON \
64-
-DBUILD_SHARED_LIBS=OFF
63+
-DGGML_RPC=ON
6564
cmake --build . --config Release -j $(sysctl -n hw.logicalcpu)
6665
6766
- name: Test
@@ -123,8 +122,7 @@ jobs:
123122
-DLLAMA_FATAL_WARNINGS=ON \
124123
-DLLAMA_CURL=ON \
125124
-DGGML_METAL=OFF \
126-
-DGGML_RPC=ON \
127-
-DBUILD_SHARED_LIBS=OFF
125+
-DGGML_RPC=ON
128126
cmake --build build --config Release -j $(sysctl -n hw.logicalcpu)
129127
130128
- name: Test
@@ -181,7 +179,7 @@ jobs:
181179
run: |
182180
mkdir build
183181
cd build
184-
cmake .. -DLLAMA_FATAL_WARNINGS=ON -DLLAMA_CURL=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF
182+
cmake .. -DLLAMA_FATAL_WARNINGS=ON -DLLAMA_CURL=ON -DGGML_RPC=ON
185183
cmake --build . --config Release -j $(nproc)
186184
187185
- name: Test
@@ -236,7 +234,7 @@ jobs:
236234
strategy:
237235
matrix:
238236
sanitizer: [ADDRESS, THREAD, UNDEFINED]
239-
build_type: [Debug, Release]
237+
build_type: [Debug]
240238

241239
steps:
242240
- name: Clone
@@ -651,23 +649,23 @@ jobs:
651649
matrix:
652650
include:
653651
- build: 'noavx-x64'
654-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_AVX=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF -DBUILD_SHARED_LIBS=ON'
652+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_AVX=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF'
655653
- build: 'avx2-x64'
656-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=ON'
654+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON'
657655
- build: 'avx-x64'
658-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_AVX2=OFF -DBUILD_SHARED_LIBS=ON'
656+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_AVX2=OFF'
659657
- build: 'avx512-x64'
660-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_AVX512=ON -DBUILD_SHARED_LIBS=ON'
658+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_AVX512=ON'
661659
- build: 'openblas-x64'
662-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BLAS=ON -DBUILD_SHARED_LIBS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'
660+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'
663661
- build: 'kompute-x64'
664-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_KOMPUTE=ON -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON -DBUILD_SHARED_LIBS=ON'
662+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_KOMPUTE=ON -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON'
665663
- build: 'vulkan-x64'
666-
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_VULKAN=ON -DBUILD_SHARED_LIBS=ON'
664+
defines: '-DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_VULKAN=ON'
667665
- build: 'llvm-arm64'
668-
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON'
666+
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON'
669667
- build: 'msvc-arm64'
670-
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-msvc.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DBUILD_SHARED_LIBS=ON'
668+
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-msvc.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON'
671669
- build: 'llvm-arm64-opencl-adreno'
672670
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON'
673671

@@ -914,7 +912,7 @@ jobs:
914912
shell: cmd
915913
run: |
916914
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
917-
cmake -S . -B build -G "Ninja Multi-Config" -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=ON -DGGML_RPC=ON
915+
cmake -S . -B build -G "Ninja Multi-Config" -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_CUDA=ON -DGGML_RPC=ON
918916
set /A NINJA_JOBS=%NUMBER_OF_PROCESSORS%-1
919917
cmake --build build --config Release -j %NINJA_JOBS% -t ggml
920918
cmake --build build --config Release
@@ -1239,7 +1237,7 @@ jobs:
12391237

12401238
- name: Create release
12411239
id: create_release
1242-
uses: anzz1/action-create-release@v1
1240+
uses: ggml-org/action-create-release@v1
12431241
env:
12441242
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
12451243
with:

.github/workflows/docker.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,10 +97,9 @@ jobs:
9797
GITHUB_BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
9898
GITHUB_REPOSITORY_OWNER: '${{ github.repository_owner }}'
9999

100-
# https://github.com/jlumbroso/free-disk-space/tree/54081f138730dfa15788a46383842cd2f914a1be#example
101100
- name: Free Disk Space (Ubuntu)
102101
if: ${{ matrix.config.free_disk_space == true }}
103-
uses: jlumbroso/free-disk-space@main
102+
uses: ggml-org/free-disk-space@v1.3.1
104103
with:
105104
# this might remove tools that are actually needed,
106105
# if set to "true" but frees about 6 GB

.github/workflows/editorconfig.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,7 @@ jobs:
2323
runs-on: ubuntu-latest
2424
steps:
2525
- uses: actions/checkout@v4
26-
- uses: editorconfig-checker/action-editorconfig-checker@main
26+
- uses: editorconfig-checker/action-editorconfig-checker@v2
27+
with:
28+
version: v3.0.3
2729
- run: editorconfig-checker

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
*.metallib
1919
*.o
2020
*.so
21+
*.swp
2122
*.tmp
2223

2324
# IDE / OS

CMakeLists.txt

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -83,11 +83,8 @@ include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/build-info.cmake)
8383
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/common.cmake)
8484

8585
# override ggml options
86-
set(GGML_SANITIZE_THREAD ${LLAMA_SANITIZE_THREAD})
87-
set(GGML_SANITIZE_ADDRESS ${LLAMA_SANITIZE_ADDRESS})
88-
set(GGML_SANITIZE_UNDEFINED ${LLAMA_SANITIZE_UNDEFINED})
89-
set(GGML_ALL_WARNINGS ${LLAMA_ALL_WARNINGS})
90-
set(GGML_FATAL_WARNINGS ${LLAMA_FATAL_WARNINGS})
86+
set(GGML_ALL_WARNINGS ${LLAMA_ALL_WARNINGS})
87+
set(GGML_FATAL_WARNINGS ${LLAMA_FATAL_WARNINGS})
9188

9289
# change the default for these ggml options
9390
if (NOT DEFINED GGML_LLAMAFILE)
@@ -117,16 +114,62 @@ llama_option_depr(WARNING LLAMA_SYCL GGML_SYCL)
117114
llama_option_depr(WARNING LLAMA_SYCL_F16 GGML_SYCL_F16)
118115
llama_option_depr(WARNING LLAMA_CANN GGML_CANN)
119116

117+
if (NOT MSVC)
118+
if (LLAMA_SANITIZE_THREAD)
119+
message(STATUS "Using -fsanitize=thread")
120+
121+
add_compile_options(-fsanitize=thread)
122+
link_libraries (-fsanitize=thread)
123+
endif()
124+
125+
if (LLAMA_SANITIZE_ADDRESS)
126+
message(STATUS "Using -fsanitize=address")
127+
128+
add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
129+
link_libraries (-fsanitize=address)
130+
endif()
131+
132+
if (LLAMA_SANITIZE_UNDEFINED)
133+
message(STATUS "Using -fsanitize=undefined")
134+
135+
add_compile_options(-fsanitize=undefined)
136+
link_libraries (-fsanitize=undefined)
137+
endif()
138+
endif()
139+
120140
#
121-
# build the library
141+
# 3rd-party
122142
#
123143

124144
if (NOT TARGET ggml)
125145
add_subdirectory(ggml)
126146
# ... otherwise assume ggml is added by a parent CMakeLists.txt
127147
endif()
148+
149+
#
150+
# build the library
151+
#
152+
128153
add_subdirectory(src)
129154

155+
#
156+
# utils, programs, examples and tests
157+
#
158+
159+
if (LLAMA_BUILD_COMMON)
160+
add_subdirectory(common)
161+
endif()
162+
163+
if (LLAMA_BUILD_COMMON AND LLAMA_BUILD_TESTS AND NOT CMAKE_JS_VERSION)
164+
include(CTest)
165+
add_subdirectory(tests)
166+
endif()
167+
168+
if (LLAMA_BUILD_COMMON AND LLAMA_BUILD_EXAMPLES)
169+
add_subdirectory(examples)
170+
add_subdirectory(pocs)
171+
endif()
172+
130173
#
131174
# install
132175
#
@@ -217,4 +260,4 @@ endif()
217260
if (LLAMA_BUILD_COMMON AND LLAMA_BUILD_EXAMPLES)
218261
add_subdirectory(examples)
219262
add_subdirectory(pocs)
220-
endif()
263+
endif()

CODEOWNERS

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# collaborators can optionally add themselves here to indicate their availability for reviewing related PRs
22

33
/ci/ @ggerganov
4-
/.devops/ @ngxson
4+
/.devops/*.Dockerfile @ngxson
55
/examples/server/ @ngxson
6+
/ggml/src/ggml-cuda/fattn* @JohannesGaessler
7+
/ggml/src/ggml-cuda/mmq.* @JohannesGaessler
8+
/ggml/src/ggml-cuda/mmv.* @JohannesGaessler
9+
/ggml/src/ggml-cuda/mmvq.* @JohannesGaessler
10+
/ggml/src/ggml-opt.cpp @JohannesGaessler
11+
/ggml/src/gguf.cpp @JohannesGaessler

CONTRIBUTING.md

Lines changed: 96 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Pull requests (for contributors)
22

33
- Test your changes:
4-
- Execute [the full CI locally on your machine](ci/README.md) before publishing
5-
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
6-
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
7-
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
4+
- Execute [the full CI locally on your machine](ci/README.md) before publishing
5+
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
6+
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
7+
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
88
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
99
- If your PR becomes stale, don't hesitate to ping the maintainers in the comments
1010

@@ -20,14 +20,104 @@
2020
- Avoid adding third-party dependencies, extra files, extra headers, etc.
2121
- Always consider cross-compatibility with other operating systems and architectures
2222
- Avoid fancy-looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
23-
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
23+
- Vertical alignment makes things more readable and easier to batch edit
2424
- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a`
25-
- Naming usually optimizes for common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
25+
- Use sized integer types such as `int32_t` in the public API, e.g. `size_t` may also be appropriate for allocation sizes or byte offsets
26+
- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo`
27+
- In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary
28+
```cpp
29+
// OK
30+
llama_context * ctx;
31+
const llama_rope_type rope_type;
32+
33+
// not OK
34+
struct llama_context * ctx;
35+
const enum llama_rope_type rope_type;
36+
```
37+
38+
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_
39+
40+
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` to format the added code
41+
- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
2642
- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
2743
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggerganov/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
2844
2945
![matmul](media/matmul.png)
3046
47+
# Naming guidelines
48+
49+
- Use `snake_case` for function, variable and type names
50+
- Naming usually optimizes for longest common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
51+
52+
```cpp
53+
// not OK
54+
int small_number;
55+
int big_number;
56+
57+
// OK
58+
int number_small;
59+
int number_big;
60+
```
61+
62+
- Enum values are always in upper case and prefixed with the enum name
63+
64+
```cpp
65+
enum llama_vocab_type {
66+
LLAMA_VOCAB_TYPE_NONE = 0,
67+
LLAMA_VOCAB_TYPE_SPM = 1,
68+
LLAMA_VOCAB_TYPE_BPE = 2,
69+
LLAMA_VOCAB_TYPE_WPM = 3,
70+
LLAMA_VOCAB_TYPE_UGM = 4,
71+
LLAMA_VOCAB_TYPE_RWKV = 5,
72+
};
73+
```
74+
75+
- The general naming pattern is `<class>_<method>`, with `<method>` being `<action>_<noun>`
76+
77+
```cpp
78+
llama_model_init(); // class: "llama_model", method: "init"
79+
llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove"
80+
llama_sampler_get_seed(); // class: "llama_sampler", method: "get_seed"
81+
llama_set_embeddings(); // class: "llama_context", method: "set_embeddings"
82+
llama_n_threads(); // class: "llama_context", method: "n_threads"
83+
llama_adapter_lora_free(); // class: "llama_adapter_lora", method: "free"
84+
```
85+
86+
- The `get` `<action>` can be omitted
87+
- The `<noun>` can be omitted if not necessary
88+
- The `_context` suffix of the `<class>` is optional. Use it to disambiguate symbols when needed
89+
- Use `init`/`free` for constructor/destructor `<action>`
90+
91+
- Use the `_t` suffix when a type is supposed to be opaque to the user - it's not relevant to them if it is a struct or anything else
92+
93+
```cpp
94+
typedef struct llama_context * llama_context_t;
95+
96+
enum llama_pooling_type llama_pooling_type(const llama_context_t ctx);
97+
```
98+
99+
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline)_
100+
101+
- C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension
102+
- Python filenames are all lowercase with underscores
103+
104+
- _(TODO: abbreviations usage)_
105+
106+
# Preprocessor directives
107+
108+
- _(TODO: add guidelines with examples and apply them to the codebase)_
109+
110+
```cpp
111+
#ifdef FOO
112+
#endif // FOO
113+
```
114+
115+
# Documentation
116+
117+
- Documentation is a community effort
118+
- When you need to look into the source code to figure out how to use an API consider adding a short summary to the header file for future reference
119+
- When you notice incorrect or outdated documentation, please update it
120+
31121
# Resources
32122
33123
The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects:

0 commit comments

Comments
 (0)