Skip to content

Commit 059325d

Browse files
authored
Merge branch 'ggerganov:master' into master
2 parents 5b87db0 + 3edfa7d commit 059325d

File tree

160 files changed

+10575
-7366
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

160 files changed

+10575
-7366
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
*.metallib
1919
*.o
2020
*.so
21+
*.swp
2122
*.tmp
2223

2324
# IDE / OS

CONTRIBUTING.md

Lines changed: 96 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Pull requests (for contributors)
22

33
- Test your changes:
4-
- Execute [the full CI locally on your machine](ci/README.md) before publishing
5-
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
6-
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
7-
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
4+
- Execute [the full CI locally on your machine](ci/README.md) before publishing
5+
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
6+
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
7+
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
88
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
99
- If your PR becomes stale, don't hesitate to ping the maintainers in the comments
1010

@@ -20,14 +20,104 @@
2020
- Avoid adding third-party dependencies, extra files, extra headers, etc.
2121
- Always consider cross-compatibility with other operating systems and architectures
2222
- Avoid fancy-looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
23-
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
23+
- Vertical alignment makes things more readable and easier to batch edit
2424
- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a`
25-
- Naming usually optimizes for common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
25+
- Use sized integer types such as `int32_t` in the public API, e.g. `size_t` may also be appropriate for allocation sizes or byte offsets
26+
- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo`
27+
- In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary
28+
```cpp
29+
// OK
30+
llama_context * ctx;
31+
const llama_rope_type rope_type;
32+
33+
// not OK
34+
struct llama_context * ctx;
35+
const enum llama_rope_type rope_type;
36+
```
37+
38+
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_
39+
40+
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` to format the added code
41+
- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
2642
- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
2743
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggerganov/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
2844
2945
![matmul](media/matmul.png)
3046
47+
# Naming guidelines
48+
49+
- Use `snake_case` for function, variable and type names
50+
- Naming usually optimizes for longest common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
51+
52+
```cpp
53+
// not OK
54+
int small_number;
55+
int big_number;
56+
57+
// OK
58+
int number_small;
59+
int number_big;
60+
```
61+
62+
- Enum values are always in upper case and prefixed with the enum name
63+
64+
```cpp
65+
enum llama_vocab_type {
66+
LLAMA_VOCAB_TYPE_NONE = 0,
67+
LLAMA_VOCAB_TYPE_SPM = 1,
68+
LLAMA_VOCAB_TYPE_BPE = 2,
69+
LLAMA_VOCAB_TYPE_WPM = 3,
70+
LLAMA_VOCAB_TYPE_UGM = 4,
71+
LLAMA_VOCAB_TYPE_RWKV = 5,
72+
};
73+
```
74+
75+
- The general naming pattern is `<class>_<method>`, with `<method>` being `<action>_<noun>`
76+
77+
```cpp
78+
llama_model_init(); // class: "llama_model", method: "init"
79+
llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove"
80+
llama_sampler_get_seed(); // class: "llama_sampler", method: "get_seed"
81+
llama_set_embeddings(); // class: "llama_context", method: "set_embeddings"
82+
llama_n_threads(); // class: "llama_context", method: "n_threads"
83+
llama_adapter_lora_free(); // class: "llama_adapter_lora", method: "free"
84+
```
85+
86+
- The `get` `<action>` can be omitted
87+
- The `<noun>` can be omitted if not necessary
88+
- The `_context` suffix of the `<class>` is optional. Use it to disambiguate symbols when needed
89+
- Use `init`/`free` for constructor/destructor `<action>`
90+
91+
- Use the `_t` suffix when a type is supposed to be opaque to the user - it's not relevant to them if it is a struct or anything else
92+
93+
```cpp
94+
typedef struct llama_context * llama_context_t;
95+
96+
enum llama_pooling_type llama_pooling_type(const llama_context_t ctx);
97+
```
98+
99+
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline)_
100+
101+
- C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension
102+
- Python filenames are all lowercase with underscores
103+
104+
- _(TODO: abbreviations usage)_
105+
106+
# Preprocessor directives
107+
108+
- _(TODO: add guidelines with examples and apply them to the codebase)_
109+
110+
```cpp
111+
#ifdef FOO
112+
#endif // FOO
113+
```
114+
115+
# Documentation
116+
117+
- Documentation is a community effort
118+
- When you need to look into the source code to figure out how to use an API consider adding a short summary to the header file for future reference
119+
- When you notice incorrect or outdated documentation, please update it
120+
31121
# Resources
32122
33123
The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects:

README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
9999
- [x] [Jais](https://huggingface.co/inceptionai/jais-13b-chat)
100100
- [x] [Bielik-11B-v2.3](https://huggingface.co/collections/speakleash/bielik-11b-v23-66ee813238d9b526a072408a)
101101
- [x] [RWKV-6](https://github.com/BlinkDL/RWKV-LM)
102+
- [x] [QRWKV-6](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1)
102103
- [x] [GigaChat-20B-A3B](https://huggingface.co/ai-sage/GigaChat-20B-A3B-instruct)
103104

104105
#### Multimodal
@@ -203,6 +204,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
203204
- [GPUStack](https://github.com/gpustack/gpustack) - Manage GPU clusters for running LLMs
204205
- [llama_cpp_canister](https://github.com/onicai/llama_cpp_canister) - llama.cpp as a smart contract on the Internet Computer, using WebAssembly
205206
- [llama-swap](https://github.com/mostlygeek/llama-swap) - transparent proxy that adds automatic model switching with llama-server
207+
- [Kalavai](https://github.com/kalavai-net/kalavai-client) - Crowdsource end to end LLM deployment at any scale
206208

207209
</details>
208210

@@ -244,6 +246,8 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt
244246
- [Trending](https://huggingface.co/models?library=gguf&sort=trending)
245247
- [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
246248

249+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from Hugging Face by using this CLI argument: `-hf <user>/<model>[:quant]`
250+
247251
After downloading a model, use the CLI tools to run it locally - see below.
248252

249253
`llama.cpp` requires the model to be stored in the [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) file format. Models in other data formats can be converted to GGUF using the `convert_*.py` Python scripts in this repo.
@@ -262,21 +266,12 @@ To learn more about model quantization, [read this documentation](examples/quant
262266
#### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
263267

264268
- <details open>
265-
<summary>Run simple text completion</summary>
266-
267-
```bash
268-
llama-cli -m model.gguf -p "I believe the meaning of life is" -n 128
269-
270-
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
271-
```
272-
273-
</details>
274-
275-
- <details>
276269
<summary>Run in conversation mode</summary>
277270

271+
Models with a built-in chat template will automatically activate conversation mode. If this doesn't occur, you can manually enable it by adding `-cnv` and specifying a suitable chat template with `--chat-template NAME`
272+
278273
```bash
279-
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv
274+
llama-cli -m model.gguf
280275

281276
# > hi, who are you?
282277
# Hi there! I'm your helpful assistant! I'm an AI-powered chatbot designed to assist and provide information to users like you. I'm here to help answer your questions, provide guidance, and offer support on a wide range of topics. I'm a friendly and knowledgeable AI, and I'm always happy to help with anything you need. What's on your mind, and how can I assist you today?
@@ -288,17 +283,28 @@ To learn more about model quantization, [read this documentation](examples/quant
288283
</details>
289284

290285
- <details>
291-
<summary>Run with custom chat template</summary>
286+
<summary>Run in conversation mode with custom chat template</summary>
292287

293288
```bash
294-
# use the "chatml" template
295-
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
289+
# use the "chatml" template (use -h to see the list of supported templates)
290+
llama-cli -m model.gguf -cnv --chat-template chatml
296291
297292
# use a custom template
298-
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
293+
llama-cli -m model.gguf -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
299294
```
300295

301-
[Supported templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
296+
</details>
297+
298+
- <details>
299+
<summary>Run simple text completion</summary>
300+
301+
To disable conversation mode explicitly, use `-no-cnv`
302+
303+
```bash
304+
llama-cli -m model.gguf -p "I believe the meaning of life is" -n 128 -no-cnv
305+
306+
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
307+
```
302308

303309
</details>
304310

0 commit comments

Comments
 (0)