Skip to content

Commit e6e9a6f

Browse files
committed
Merge branch 'master' into xsn/cli_auto_cnv
2 parents 5b1f710 + a29f087 commit e6e9a6f

File tree

7 files changed

+230
-26
lines changed

7 files changed

+230
-26
lines changed

CONTRIBUTING.md

Lines changed: 96 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Pull requests (for contributors)
22

33
- Test your changes:
4-
- Execute [the full CI locally on your machine](ci/README.md) before publishing
5-
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
6-
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
7-
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
4+
- Execute [the full CI locally on your machine](ci/README.md) before publishing
5+
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
6+
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
7+
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
88
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
99
- If your PR becomes stale, don't hesitate to ping the maintainers in the comments
1010

@@ -20,14 +20,104 @@
2020
- Avoid adding third-party dependencies, extra files, extra headers, etc.
2121
- Always consider cross-compatibility with other operating systems and architectures
2222
- Avoid fancy-looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
23-
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
23+
- Vertical alignment makes things more readable and easier to batch edit
2424
- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a`
25-
- Naming usually optimizes for common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
25+
- Use sized integer types such as `int32_t` in the public API, e.g. `size_t` may also be appropriate for allocation sizes or byte offsets
26+
- Declare structs with `struct foo {}` instead of `typedef struct foo {} foo`
27+
- In C++ code omit optional `struct` and `enum` keyword whenever they are not necessary
28+
```cpp
29+
// OK
30+
llama_context * ctx;
31+
const llama_rope_type rope_type;
32+
33+
// not OK
34+
struct llama_context * ctx;
35+
const enum llama_rope_type rope_type;
36+
```
37+
38+
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_
39+
40+
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` to format the added code
41+
- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
2642
- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
2743
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggerganov/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
2844
2945
![matmul](media/matmul.png)
3046
47+
# Naming guidelines
48+
49+
- Use `snake_case` for function, variable and type names
50+
- Naming usually optimizes for longest common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
51+
52+
```cpp
53+
// not OK
54+
int small_number;
55+
int big_number;
56+
57+
// OK
58+
int number_small;
59+
int number_big;
60+
```
61+
62+
- Enum values are always in upper case and prefixed with the enum name
63+
64+
```cpp
65+
enum llama_vocab_type {
66+
LLAMA_VOCAB_TYPE_NONE = 0,
67+
LLAMA_VOCAB_TYPE_SPM = 1,
68+
LLAMA_VOCAB_TYPE_BPE = 2,
69+
LLAMA_VOCAB_TYPE_WPM = 3,
70+
LLAMA_VOCAB_TYPE_UGM = 4,
71+
LLAMA_VOCAB_TYPE_RWKV = 5,
72+
};
73+
```
74+
75+
- The general naming pattern is `<class>_<method>`, with `<method>` being `<action>_<noun>`
76+
77+
```cpp
78+
llama_model_init(); // class: "llama_model", method: "init"
79+
llama_sampler_chain_remove(); // class: "llama_sampler_chain", method: "remove"
80+
llama_sampler_get_seed(); // class: "llama_sampler", method: "get_seed"
81+
llama_set_embeddings(); // class: "llama_context", method: "set_embeddings"
82+
llama_n_threads(); // class: "llama_context", method: "n_threads"
83+
llama_adapter_lora_free(); // class: "llama_adapter_lora", method: "free"
84+
```
85+
86+
- The `get` `<action>` can be omitted
87+
- The `<noun>` can be omitted if not necessary
88+
- The `_context` suffix of the `<class>` is optional. Use it to disambiguate symbols when needed
89+
- Use `init`/`free` for constructor/destructor `<action>`
90+
91+
- Use the `_t` suffix when a type is supposed to be opaque to the user - it's not relevant to them if it is a struct or anything else
92+
93+
```cpp
94+
typedef struct llama_context * llama_context_t;
95+
96+
enum llama_pooling_type llama_pooling_type(const llama_context_t ctx);
97+
```
98+
99+
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline)_
100+
101+
- C/C++ filenames are all lowercase with dashes. Headers use the `.h` extension. Source files use the `.c` or `.cpp` extension
102+
- Python filenames are all lowercase with underscores
103+
104+
- _(TODO: abbreviations usage)_
105+
106+
# Preprocessor directives
107+
108+
- _(TODO: add guidelines with examples and apply them to the codebase)_
109+
110+
```cpp
111+
#ifdef FOO
112+
#endif // FOO
113+
```
114+
115+
# Documentation
116+
117+
- Documentation is a community effort
118+
- When you need to look into the source code to figure out how to use an API consider adding a short summary to the header file for future reference
119+
- When you notice incorrect or outdated documentation, please update it
120+
31121
# Resources
32122
33123
The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects:

common/arg.cpp

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -130,17 +130,26 @@ std::string common_arg::to_string() {
130130

131131
static void common_params_handle_model_default(
132132
std::string & model,
133-
std::string & model_url,
133+
const std::string & model_url,
134134
std::string & hf_repo,
135-
std::string & hf_file) {
135+
std::string & hf_file,
136+
const std::string & hf_token) {
136137
if (!hf_repo.empty()) {
137138
// short-hand to avoid specifying --hf-file -> default it to --model
138139
if (hf_file.empty()) {
139140
if (model.empty()) {
140-
throw std::invalid_argument("error: --hf-repo requires either --hf-file or --model\n");
141+
auto auto_detected = common_get_hf_file(hf_repo, hf_token);
142+
if (auto_detected.first.empty() || auto_detected.second.empty()) {
143+
exit(1); // built without CURL, error message already printed
144+
}
145+
hf_repo = auto_detected.first;
146+
hf_file = auto_detected.second;
147+
} else {
148+
hf_file = model;
141149
}
142-
hf_file = model;
143-
} else if (model.empty()) {
150+
}
151+
// make sure model path is present (for caching purposes)
152+
if (model.empty()) {
144153
// this is to avoid different repo having same file name, or same file name in different subdirs
145154
std::string filename = hf_repo + "_" + hf_file;
146155
// to make sure we don't have any slashes in the filename
@@ -290,8 +299,8 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context
290299
}
291300

292301
// TODO: refactor model params in a common struct
293-
common_params_handle_model_default(params.model, params.model_url, params.hf_repo, params.hf_file);
294-
common_params_handle_model_default(params.vocoder.model, params.vocoder.model_url, params.vocoder.hf_repo, params.vocoder.hf_file);
302+
common_params_handle_model_default(params.model, params.model_url, params.hf_repo, params.hf_file, params.hf_token);
303+
common_params_handle_model_default(params.vocoder.model, params.vocoder.model_url, params.vocoder.hf_repo, params.vocoder.hf_file, params.hf_token);
295304

296305
if (params.escape) {
297306
string_process_escapes(params.prompt);
@@ -1587,21 +1596,23 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
15871596
}
15881597
).set_env("LLAMA_ARG_MODEL_URL"));
15891598
add_opt(common_arg(
1590-
{"-hfr", "--hf-repo"}, "REPO",
1591-
"Hugging Face model repository (default: unused)",
1599+
{"-hf", "-hfr", "--hf-repo"}, "<user>/<model>[:quant]",
1600+
"Hugging Face model repository; quant is optional, case-insensitive, default to Q4_K_M, or falls back to the first file in the repo if Q4_K_M doesn't exist.\n"
1601+
"example: unsloth/phi-4-GGUF:q4_k_m\n"
1602+
"(default: unused)",
15921603
[](common_params & params, const std::string & value) {
15931604
params.hf_repo = value;
15941605
}
15951606
).set_env("LLAMA_ARG_HF_REPO"));
15961607
add_opt(common_arg(
15971608
{"-hff", "--hf-file"}, "FILE",
1598-
"Hugging Face model file (default: unused)",
1609+
"Hugging Face model file. If specified, it will override the quant in --hf-repo (default: unused)",
15991610
[](common_params & params, const std::string & value) {
16001611
params.hf_file = value;
16011612
}
16021613
).set_env("LLAMA_ARG_HF_FILE"));
16031614
add_opt(common_arg(
1604-
{"-hfrv", "--hf-repo-v"}, "REPO",
1615+
{"-hfv", "-hfrv", "--hf-repo-v"}, "<user>/<model>[:quant]",
16051616
"Hugging Face model repository for the vocoder model (default: unused)",
16061617
[](common_params & params, const std::string & value) {
16071618
params.vocoder.hf_repo = value;

common/common.cpp

Lines changed: 100 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,22 @@
7373
#include <sys/syslimits.h>
7474
#endif
7575
#define LLAMA_CURL_MAX_URL_LENGTH 2084 // Maximum URL Length in Chrome: 2083
76+
77+
//
78+
// CURL utils
79+
//
80+
81+
using curl_ptr = std::unique_ptr<CURL, decltype(&curl_easy_cleanup)>;
82+
83+
// cannot use unique_ptr for curl_slist, because we cannot update without destroying the old one
84+
struct curl_slist_ptr {
85+
struct curl_slist * ptr = nullptr;
86+
~curl_slist_ptr() {
87+
if (ptr) {
88+
curl_slist_free_all(ptr);
89+
}
90+
}
91+
};
7692
#endif // LLAMA_USE_CURL
7793

7894
using json = nlohmann::ordered_json;
@@ -1130,7 +1146,8 @@ static bool curl_perform_with_retry(const std::string & url, CURL * curl, int ma
11301146

11311147
static bool common_download_file(const std::string & url, const std::string & path, const std::string & hf_token) {
11321148
// Initialize libcurl
1133-
std::unique_ptr<CURL, decltype(&curl_easy_cleanup)> curl(curl_easy_init(), &curl_easy_cleanup);
1149+
curl_ptr curl(curl_easy_init(), &curl_easy_cleanup);
1150+
curl_slist_ptr http_headers;
11341151
if (!curl) {
11351152
LOG_ERR("%s: error initializing libcurl\n", __func__);
11361153
return false;
@@ -1144,11 +1161,9 @@ static bool common_download_file(const std::string & url, const std::string & pa
11441161

11451162
// Check if hf-token or bearer-token was specified
11461163
if (!hf_token.empty()) {
1147-
std::string auth_header = "Authorization: Bearer ";
1148-
auth_header += hf_token.c_str();
1149-
struct curl_slist *http_headers = NULL;
1150-
http_headers = curl_slist_append(http_headers, auth_header.c_str());
1151-
curl_easy_setopt(curl.get(), CURLOPT_HTTPHEADER, http_headers);
1164+
std::string auth_header = "Authorization: Bearer " + hf_token;
1165+
http_headers.ptr = curl_slist_append(http_headers.ptr, auth_header.c_str());
1166+
curl_easy_setopt(curl.get(), CURLOPT_HTTPHEADER, http_headers.ptr);
11521167
}
11531168

11541169
#if defined(_WIN32)
@@ -1444,6 +1459,80 @@ struct llama_model * common_load_model_from_hf(
14441459
return common_load_model_from_url(model_url, local_path, hf_token, params);
14451460
}
14461461

1462+
/**
1463+
* Allow getting the HF file from the HF repo with tag (like ollama), for example:
1464+
* - bartowski/Llama-3.2-3B-Instruct-GGUF:q4
1465+
* - bartowski/Llama-3.2-3B-Instruct-GGUF:Q4_K_M
1466+
* - bartowski/Llama-3.2-3B-Instruct-GGUF:q5_k_s
1467+
* Tag is optional, default to "latest" (meaning it checks for Q4_K_M first, then Q4, then if not found, return the first GGUF file in repo)
1468+
*
1469+
* Return pair of <repo, file> (with "repo" already having tag removed)
1470+
*
1471+
* Note: we use the Ollama-compatible HF API, but not using the blobId. Instead, we use the special "ggufFile" field which returns the value for "hf_file". This is done to be backward-compatible with existing cache files.
1472+
*/
1473+
std::pair<std::string, std::string> common_get_hf_file(const std::string & hf_repo_with_tag, const std::string & hf_token) {
1474+
auto parts = string_split<std::string>(hf_repo_with_tag, ':');
1475+
std::string tag = parts.size() > 1 ? parts.back() : "latest";
1476+
std::string hf_repo = parts[0];
1477+
if (string_split<std::string>(hf_repo, '/').size() != 2) {
1478+
throw std::invalid_argument("error: invalid HF repo format, expected <user>/<model>[:quant]\n");
1479+
}
1480+
1481+
// fetch model info from Hugging Face Hub API
1482+
json model_info;
1483+
curl_ptr curl(curl_easy_init(), &curl_easy_cleanup);
1484+
curl_slist_ptr http_headers;
1485+
std::string res_str;
1486+
std::string url = "https://huggingface.co/v2/" + hf_repo + "/manifests/" + tag;
1487+
curl_easy_setopt(curl.get(), CURLOPT_URL, url.c_str());
1488+
curl_easy_setopt(curl.get(), CURLOPT_NOPROGRESS, 1L);
1489+
typedef size_t(*CURLOPT_WRITEFUNCTION_PTR)(void * ptr, size_t size, size_t nmemb, void * data);
1490+
auto write_callback = [](void * ptr, size_t size, size_t nmemb, void * data) -> size_t {
1491+
static_cast<std::string *>(data)->append((char * ) ptr, size * nmemb);
1492+
return size * nmemb;
1493+
};
1494+
curl_easy_setopt(curl.get(), CURLOPT_WRITEFUNCTION, static_cast<CURLOPT_WRITEFUNCTION_PTR>(write_callback));
1495+
curl_easy_setopt(curl.get(), CURLOPT_WRITEDATA, &res_str);
1496+
#if defined(_WIN32)
1497+
curl_easy_setopt(curl.get(), CURLOPT_SSL_OPTIONS, CURLSSLOPT_NATIVE_CA);
1498+
#endif
1499+
if (!hf_token.empty()) {
1500+
std::string auth_header = "Authorization: Bearer " + hf_token;
1501+
http_headers.ptr = curl_slist_append(http_headers.ptr, auth_header.c_str());
1502+
}
1503+
// Important: the User-Agent must be "llama-cpp" to get the "ggufFile" field in the response
1504+
http_headers.ptr = curl_slist_append(http_headers.ptr, "User-Agent: llama-cpp");
1505+
http_headers.ptr = curl_slist_append(http_headers.ptr, "Accept: application/json");
1506+
curl_easy_setopt(curl.get(), CURLOPT_HTTPHEADER, http_headers.ptr);
1507+
1508+
CURLcode res = curl_easy_perform(curl.get());
1509+
1510+
if (res != CURLE_OK) {
1511+
throw std::runtime_error("error: cannot make GET request to HF API");
1512+
}
1513+
1514+
long res_code;
1515+
curl_easy_getinfo(curl.get(), CURLINFO_RESPONSE_CODE, &res_code);
1516+
if (res_code == 200) {
1517+
model_info = json::parse(res_str);
1518+
} else if (res_code == 401) {
1519+
throw std::runtime_error("error: model is private or does not exist; if you are accessing a gated model, please provide a valid HF token");
1520+
} else {
1521+
throw std::runtime_error(string_format("error from HF API, response code: %ld, data: %s", res_code, res_str.c_str()));
1522+
}
1523+
1524+
// check response
1525+
if (!model_info.contains("ggufFile")) {
1526+
throw std::runtime_error("error: model does not have ggufFile");
1527+
}
1528+
json & gguf_file = model_info.at("ggufFile");
1529+
if (!gguf_file.contains("rfilename")) {
1530+
throw std::runtime_error("error: ggufFile does not have rfilename");
1531+
}
1532+
1533+
return std::make_pair(hf_repo, gguf_file.at("rfilename"));
1534+
}
1535+
14471536
#else
14481537

14491538
struct llama_model * common_load_model_from_url(
@@ -1465,6 +1554,11 @@ struct llama_model * common_load_model_from_hf(
14651554
return nullptr;
14661555
}
14671556

1557+
std::pair<std::string, std::string> common_get_hf_file(const std::string &, const std::string &) {
1558+
LOG_WRN("%s: llama.cpp built without libcurl, downloading from Hugging Face not supported.\n", __func__);
1559+
return std::make_pair("", "");
1560+
}
1561+
14681562
#endif // LLAMA_USE_CURL
14691563

14701564
//

common/common.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,11 @@ static bool string_starts_with(const std::string & str,
461461
return str.rfind(prefix, 0) == 0;
462462
}
463463

464+
static bool string_ends_with(const std::string & str,
465+
const std::string & suffix) { // While we wait for C++20's std::string::ends_with...
466+
return str.size() >= suffix.size() && str.compare(str.size()-suffix.size(), suffix.size(), suffix) == 0;
467+
}
468+
464469
bool string_parse_kv_override(const char * data, std::vector<llama_model_kv_override> & overrides);
465470
void string_process_escapes(std::string & input);
466471

@@ -508,6 +513,9 @@ struct llama_model * common_load_model_from_hf(
508513
const std::string & local_path,
509514
const std::string & hf_token,
510515
const struct llama_model_params & params);
516+
std::pair<std::string, std::string> common_get_hf_file(
517+
const std::string & hf_repo_with_tag,
518+
const std::string & hf_token);
511519

512520
// clear LoRA adapters from context, then apply new list of adapters
513521
void common_set_adapter_lora(struct llama_context * ctx, std::vector<common_adapter_lora_info> & lora);
14 Bytes
Binary file not shown.

examples/server/webui/index.html

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ <h2 class="font-bold ml-4">Conversations</h2>
3737
<div v-for="conv in conversations" :class="{
3838
'btn btn-ghost justify-start font-normal': true,
3939
'btn-active': conv.id === viewingConvId,
40-
}" @click="setViewingConv(conv.id)">
40+
}" @click="setViewingConv(conv.id)" dir="auto">
4141
<span class="truncate">{{ conv.messages[0].content }}</span>
4242
</div>
4343
<div class="text-center text-xs opacity-40 mt-auto mx-4">
@@ -156,6 +156,7 @@ <h2 class="font-bold ml-4">Conversations</h2>
156156
@keydown.enter.shift.exact.prevent="inputMsg += '\n'"
157157
:disabled="isGenerating"
158158
id="msg-input"
159+
dir="auto"
159160
></textarea>
160161
<button v-if="!isGenerating" class="btn btn-primary ml-2" @click="sendMessage" :disabled="inputMsg.length === 0">Send</button>
161162
<button v-else class="btn btn-neutral ml-2" @click="stopGeneration">Stop</button>
@@ -244,7 +245,7 @@ <h3 class="text-lg font-bold mb-6">Settings</h3>
244245
<div :class="{
245246
'chat-bubble markdown': true,
246247
'chat-bubble-base-300': msg.role !== 'user',
247-
}">
248+
}" dir="auto">
248249
<!-- textarea for editing message -->
249250
<template v-if="editingContent !== null">
250251
<textarea

src/llama-vocab.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1729,7 +1729,7 @@ void llama_vocab::impl::load(llama_model_loader & ml, const LLM_KV & kv) {
17291729
continue;
17301730
}
17311731
if (new_id >= id_to_token.size()) {
1732-
LLAMA_LOG_WARN("%s: bad special token: '%s' = %ud, using default id %d\n",
1732+
LLAMA_LOG_WARN("%s: bad special token: '%s' = %u, using default id %d\n",
17331733
__func__, key.c_str(), new_id, id);
17341734
} else {
17351735
id = new_id;

0 commit comments

Comments
 (0)