Skip to content

Commit 1166fda

Browse files
committed
Merge branch 'master' into concedo
# Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md
2 parents 47ea33a + a50e39c commit 1166fda

File tree

13 files changed

+147
-173
lines changed

13 files changed

+147
-173
lines changed

.github/ISSUE_TEMPLATE/custom.md

Lines changed: 8 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
name: Custom issue template
3-
about: Used to report user-related issues with the software
4-
title: "[User] I encountered a problem .."
2+
name: Issue and enhancement template
3+
about: Used to report issues and request enhancements for llama.cpp
4+
title: "[User] Insert summary of your issue or enhancement.."
55
labels: ''
66
assignees: ''
77

@@ -18,11 +18,11 @@ Please answer the following questions for yourself before submitting an issue.
1818

1919
# Expected Behavior
2020

21-
Please provide a detailed written description of what you were trying to do, and what you expected `lamma.cpp` to do.
21+
Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do.
2222

2323
# Current Behavior
2424

25-
Please provide a detailed written description of what `lamma.cpp` did, instead.
25+
Please provide a detailed written description of what `llama.cpp` did, instead.
2626

2727
# Environment and Context
2828

@@ -44,20 +44,6 @@ $ make --version
4444
$ g++ --version
4545
```
4646

47-
# Models
48-
49-
* The LLaMA models are officially distributed by Facebook and will never be provided through this repository. See this [pull request in Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to obtain access to the model data.
50-
* If your issue is with model conversion please verify the `sha256sum` of each of your `consolidated*.pth` and `ggml-model-XXX.bin` files to confirm that you have the correct model data files before logging an issue. [Latest sha256 sums for your reference](https://github.com/ggerganov/llama.cpp/issues/238).
51-
* If your issue is with model generation quality then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
52-
* LLaMA:
53-
* [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
54-
* [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
55-
* GPT-3
56-
* [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
57-
* GPT-3.5 / InstructGPT / ChatGPT:
58-
* [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
59-
* [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
60-
6147
# Failure Information (for bugs)
6248

6349
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
@@ -75,8 +61,9 @@ Please provide detailed steps for reproducing the issue. We are not sitting in f
7561

7662
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
7763

78-
Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability. e.g.
64+
Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability.
7965

66+
Example environment info:
8067
```
8168
llama.cpp$ git log | head -1
8269
commit 2af23d30434a677c6416812eea52ccc0af65119c
@@ -103,8 +90,8 @@ GNU Make 4.3
10390
$ md5sum ./models/65B/ggml-model-q4_0.bin
10491
dbdd682cce80e2d6e93cefc7449df487 ./models/65B/ggml-model-q4_0.bin
10592
```
106-
Here's a run with the Linux command [perf](https://www.brendangregg.com/perf.html)
10793

94+
Example run with the Linux command [perf](https://www.brendangregg.com/perf.html)
10895
```
10996
llama.cpp$ perf stat ./main -m ./models/65B/ggml-model-q4_0.bin -t 16 -n 1024 -p "Please close your issue when it has been answered."
11097
main: seed = 1679149377

Makefile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,9 @@ clean:
234234

235235
main: main.cpp ggml.o extra.o utils.o
236236
$(CXX) $(CXXFLAGS) main.cpp ggml.o extra.o utils.o -o main $(LDFLAGS)
237-
@echo "\x1b[36mrun ./main -h for help\x1b[0m"
237+
@echo
238+
@echo '==== Run ./main -h for help. ===='
239+
@echo
238240

239241
llamalib: expose.cpp ggml.o utils.o extra.o
240242
$(CXX) $(CXXFLAGS) expose.cpp ggml.o utils.o extra.o -shared -o llamacpp.dll $(LDFLAGS)

SHA256SUMS

Lines changed: 1 addition & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,12 @@
11
700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d models/7B/consolidated.00.pth
2-
abe4aec2cdc297e2916011f66c7efd6fb4424e0e84315503005b5c118358cc22 models/7B/ggml-model-f16.bin
3-
f495fa02a0b5ef265e1864d9680eede7fd23a60b0a2f93edba8091e2a4ca68b9 models/7B/ggml-model-q4_0.bin
42
7e89e242ddc0dd6f060b43ca219ce8b3e8f08959a72cb3c0855df8bb04d46265 models/7B/params.json
53
745bf4e29a4dd6f411e72976d92b452da1b49168a4f41c951cfcc8051823cf08 models/13B/consolidated.00.pth
64
d5ccbcc465c71c0de439a5aeffebe8344c68a519bce70bc7f9f92654ee567085 models/13B/consolidated.01.pth
7-
a6bd0537c6873f36c47292df0b6f794e1135f5aafb89c3343bcc9e93264bf167 models/13B/ggml-model-f16.bin
8-
0fb0951b90f2ec46c1f2f2372af5dacb4614b27e9fb6c10c69fbec58d7dd0e36 models/13B/ggml-model-f16.bin.1
9-
1c218ba37ae61e15e35efd9949c78d6edf553b6280824c263cad56ae0b9d5a8f models/13B/ggml-model-q4_0.bin
10-
c37a20c2ab9fa74b006b389085660269ee06110d1e45a494eb57d4602c9bcdb2 models/13B/ggml-model-q4_0.bin.1
115
4ab77bec4d4405ccb66a97b282574c89a94417e3c32e5f68f37e2876fc21322f models/13B/params.json
126
e23294a58552d8cdec5b7e8abb87993b97ea6eced4178ff2697c02472539d067 models/30B/consolidated.00.pth
137
4e077b7136c7ae2302e954860cf64930458d3076fcde9443f4d0e939e95903ff models/30B/consolidated.01.pth
148
24a87f01028cbd3a12de551dcedb712346c0b5cbdeff1454e0ddf2df9b675378 models/30B/consolidated.02.pth
159
1adfcef71420886119544949767f6a56cb6339b4d5fcde755d80fe68b49de93b models/30B/consolidated.03.pth
16-
def20ea508f4e36793719f857471e85b85f96e497a2cbffbbaa1b60e2b18202c models/30B/ggml-model-f16.bin
17-
b37040aa67fa8608cb2d8e0719132cf3e267fd35ec1e2f0d37dbc9fa43d674f1 models/30B/ggml-model-f16.bin.1
18-
e7f263557e99069fe29003262ea5fa9ed885dbe79069083e6eb569b328cf30d3 models/30B/ggml-model-f16.bin.2
19-
2ad6a23af05eb720f202f63d130f4fc5de9b6d2efc95b921be003209a56695aa models/30B/ggml-model-f16.bin.3
20-
7de31d005e6d02ebd9603b2cf5329ad2f832b65d08873a098c5cafc4046cb9ed models/30B/ggml-model-q4_0.bin
21-
f91feef9f30f9a023616db2e91297ca6d5d5d7b9eb351e452a82115c46f7da9e models/30B/ggml-model-q4_0.bin.1
22-
66f3a0916ac7a81839153eb061fa861030ed1892477c2f7af2ce4f98d2f6d06f models/30B/ggml-model-q4_0.bin.2
23-
e3c587ba97f83d2088b001bcda3026571065649ee3090bef6743a51390b01d3b models/30B/ggml-model-q4_0.bin.3
2410
2c07118ea98d69dbe7810d88520e30288fa994751b337f8fca02b171955f44cb models/30B/params.json
2511
135c563f6b3938114458183afb01adc9a63bef3d8ff7cccc3977e5d3664ecafe models/65B/consolidated.00.pth
2612
9a600b37b19d38c7e43809485f70d17d1dc12206c07efa83bc72bb498a568bde models/65B/consolidated.01.pth
@@ -30,24 +16,5 @@ e7babf7c5606f165a3756f527cb0fedc4f83e67ef1290391e52fb1cce5f26770 models/65B/con
3016
a287c0dfe49081626567c7fe87f74cce5831f58e459b427b5e05567641f47b78 models/65B/consolidated.05.pth
3117
72b4eba67a1a3b18cb67a85b70f8f1640caae9b40033ea943fb166bd80a7b36b models/65B/consolidated.06.pth
3218
d27f5b0677d7ff129ceacd73fd461c4d06910ad7787cf217b249948c3f3bc638 models/65B/consolidated.07.pth
33-
7eba2625260cd91f8de901fd9704a1aa39448425514a335a0d3878de4ab9dc77 models/65B/ggml-model-f16.bin
34-
f6aa886575df0785d4231f30cc776d499ccde18857818effc0378c65b178e0b5 models/65B/ggml-model-f16.bin.1
35-
076037141682f5d7537955058c4740ab27f285aa4588915f830874a589c0693d models/65B/ggml-model-f16.bin.2
36-
7853d96d2903ad7de2b2a89c4acf5a33a2f8e3c24ac39c9df6b44cdb42bf530a models/65B/ggml-model-f16.bin.3
37-
b16b7b941abb3bc03a14df1656140855e9360a5371c83e919b9da83a72362314 models/65B/ggml-model-f16.bin.4
38-
5291270216f888697695acb78ef28df0c080f9e85d3245c92fb9992d1fde6678 models/65B/ggml-model-f16.bin.5
39-
0685ee77715f34686841006f8f94d3e7eaf148b97cecc9d3eee72808b0f7989c models/65B/ggml-model-f16.bin.6
40-
00d993d73bb21d7c29388ffe0dced008cbaa0d391831dea77d7eb8f0b5c404b9 models/65B/ggml-model-f16.bin.7
41-
4e398f05842206e08cdc5e7bb4f6c7c34b9dc373435ece6f261b14b7b4fe9b89 models/65B/ggml-model-q4_0.bin
42-
4c4e899e3b12d9f57c9dcea5a1fb41bbc72023323535551f6273582ca7d7294b models/65B/ggml-model-q4_0.bin.1
43-
d7b4594bbbd192043b3db0e5acc2561c42e6944e1cb91cc6e61510eee89dbcd8 models/65B/ggml-model-q4_0.bin.2
44-
9a099d271648863d923d0d097391ea0bc75591f27a2ca3a327760f42e6b69af2 models/65B/ggml-model-q4_0.bin.3
45-
5ee474051e418c5732b7949190b084d9d679db447f83c1de0d2a82daaa1a0cfa models/65B/ggml-model-q4_0.bin.4
46-
a45aa05e7212bd6782790722d68056c5419667ea6b564ccc94bbcb8111d79b8b models/65B/ggml-model-q4_0.bin.5
47-
a58fda714b759c28ad5e4c1d8bf8fda7b158fd5e4c4a49f851f36342fa97a105 models/65B/ggml-model-q4_0.bin.6
48-
a3540cfcbcda33c223c6b0d606034adbd78f17e0e5de1582b78795e78754f7a8 models/65B/ggml-model-q4_0.bin.7
4919
999ed1659b469ccc2a941714c0a9656fa571d17c9f7c8c7589817ca90edef51b models/65B/params.json
50-
1f582babc2bd56bb63b33141898748657d369fd110c4358b2bc280907882bf13 models/alpaca-7B/ggml-model-q4_0.bin
51-
e17730c6b62b565b098af023ca446dcb9e3535d4222ead6369c7aae67207eb3d models/alpaca-13B/ggml-model-q4_0.bin
52-
9bcd1bb30e679c939f367be11b030fe20b3eb9a3606b9bc4106420f1827b6ae4 models/alpaca-30B/ggml-model-q4_0.bin
53-
36079249f53c292a4c2302d7784005dcae94c865f0bedfdbfa51d9ddad402935 models/alpaca-30B/params.json
20+
9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347 models/tokenizer.model

ggml.c

Lines changed: 68 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
// Defines CLOCK_MONOTONIC on Linux
2+
#define _POSIX_C_SOURCE 199309L
3+
14
#include "ggml.h"
25

36
#if defined(_MSC_VER) || defined(__MINGW32__)
@@ -400,16 +403,63 @@ static inline __m128i packNibbles( __m256i bytes )
400403
// method 5
401404
// blocks of QK elements
402405
// represented with a single float (delta) and QK/2 8-bit ints (i.e QK 4-bit signed integer factors)
406+
407+
// reference implementation for deterministic creation of model files
408+
static void quantize_row_q4_0_reference(const float * restrict x, void * restrict y, int k) {
409+
assert(k % QK == 0);
410+
const int nb = k / QK;
411+
412+
const size_t bs = sizeof(float) + QK/2;
413+
414+
uint8_t * restrict pd = ((uint8_t *)y + 0*bs);
415+
uint8_t * restrict pb = ((uint8_t *)y + 0*bs + sizeof(float));
416+
417+
uint8_t pp[QK/2];
418+
419+
for (int i = 0; i < nb; i++) {
420+
float amax = 0.0f; // absolute max
421+
422+
for (int l = 0; l < QK; l++) {
423+
const float v = x[i*QK + l];
424+
amax = MAX(amax, fabsf(v));
425+
}
426+
427+
const float d = amax / ((1 << 3) - 1);
428+
const float id = d ? 1.0f/d : 0.0f;
429+
430+
*(float *)pd = d;
431+
pd += bs;
432+
433+
for (int l = 0; l < QK; l += 2) {
434+
const float v0 = x[i*QK + l + 0]*id;
435+
const float v1 = x[i*QK + l + 1]*id;
436+
437+
const uint8_t vi0 = ((int8_t) (round(v0))) + 8;
438+
const uint8_t vi1 = ((int8_t) (round(v1))) + 8;
439+
440+
assert(vi0 >= 0 && vi0 < 16);
441+
assert(vi1 >= 0 && vi1 < 16);
442+
443+
pp[l/2] = vi0 | (vi1 << 4);
444+
}
445+
446+
memcpy(pb, pp, sizeof(pp));
447+
pb += bs;
448+
}
449+
}
450+
403451
void quantize_row_q4_0(const float * restrict x, void * restrict y, int k) {
404452
assert(k % QK == 0);
405453

454+
#if __ARM_NEON || defined(__AVX2__) || defined(__wasm_simd128__)
406455
const int nb = k / QK;
407456
const size_t bs = sizeof(float) + QK/2;
408457

409458
uint8_t * restrict pd = ((uint8_t *)y + 0*bs);
410459
uint8_t * restrict pb = ((uint8_t *)y + 0*bs + sizeof(float));
411460

412461
uint8_t pp[QK/2];
462+
#endif
413463

414464
#if __ARM_NEON
415465
#if QK == 32
@@ -566,36 +616,7 @@ void quantize_row_q4_0(const float * restrict x, void * restrict y, int k) {
566616
#endif
567617
#else
568618
// scalar
569-
for (int i = 0; i < nb; i++) {
570-
float amax = 0.0f; // absolute max
571-
572-
for (int l = 0; l < QK; l++) {
573-
const float v = x[i*QK + l];
574-
amax = MAX(amax, fabsf(v));
575-
}
576-
577-
const float d = amax / ((1 << 3) - 1);
578-
const float id = d ? 1.0f/d : 0.0f;
579-
580-
*(float *)pd = d;
581-
pd += bs;
582-
583-
for (int l = 0; l < QK; l += 2) {
584-
const float v0 = x[i*QK + l + 0]*id;
585-
const float v1 = x[i*QK + l + 1]*id;
586-
587-
const uint8_t vi0 = ((int8_t) (round(v0))) + 8;
588-
const uint8_t vi1 = ((int8_t) (round(v1))) + 8;
589-
590-
assert(vi0 >= 0 && vi0 < 16);
591-
assert(vi1 >= 0 && vi1 < 16);
592-
593-
pp[l/2] = vi0 | (vi1 << 4);
594-
}
595-
596-
memcpy(pb, pp, sizeof(pp));
597-
pb += bs;
598-
}
619+
quantize_row_q4_0_reference(x, y, k);
599620
#endif
600621
}
601622

@@ -10702,119 +10723,60 @@ enum ggml_opt_result ggml_opt(
1070210723

1070310724
////////////////////////////////////////////////////////////////////////////////
1070410725

10705-
size_t ggml_quantize_q4_0(float * src, void * dst, int n, int k, int qk, int64_t * hist) {
10726+
size_t ggml_quantize_q4_0(const float * src, void * dst, int n, int k, int qk, int64_t * hist) {
1070610727
const int nb = k / qk;
1070710728
const size_t bs = (sizeof(float) + sizeof(uint8_t)*qk/2);
1070810729
const size_t row_size = nb*bs;
1070910730

1071010731
assert(k % qk == 0);
1071110732

10712-
const size_t pp_size = qk / 2;
10713-
uint8_t * pp = (uint8_t *) alloca(pp_size);
10714-
1071510733
char * pdst = (char *) dst;
1071610734

1071710735
for (int j = 0; j < n; j += k) {
1071810736
uint8_t * pd = (uint8_t *) (pdst + (j/k)*row_size + 0*bs);
1071910737
uint8_t * pb = (uint8_t *) (pdst + (j/k)*row_size + 0*bs + sizeof(float));
1072010738

10721-
for (int i = 0; i < nb; i++) {
10722-
float amax = 0.0f; // absolute max
10723-
10724-
{
10725-
for (int l = 0; l < qk; l++) {
10726-
const float v = src[j + i*qk + l];
10727-
amax = MAX(amax, fabsf(v));
10728-
}
10729-
10730-
const float d = amax / ((1 << 3) - 1);
10731-
const float id = d ? 1.0f/d : 0.0f;
10732-
10733-
*(float *) pd = d;
10734-
pd += bs;
10739+
quantize_row_q4_0_reference(src + j, pd, k);
1073510740

10736-
for (int l = 0; l < qk; l += 2) {
10737-
const float v0 = (src[j + i*qk + l + 0])*id;
10738-
const float v1 = (src[j + i*qk + l + 1])*id;
10739-
10740-
const uint8_t vi0 = ((int8_t) (round(v0))) + 8;
10741-
const uint8_t vi1 = ((int8_t) (round(v1))) + 8;
10742-
10743-
assert(vi0 >= 0 && vi0 < 16);
10744-
assert(vi1 >= 0 && vi1 < 16);
10745-
10746-
hist[vi0]++;
10747-
hist[vi1]++;
10748-
10749-
pp[l/2] = vi0 | (vi1 << 4);
10750-
}
10741+
for (int i = 0; i < nb; i++) {
10742+
for (int l = 0; l < qk; l += 2) {
10743+
const uint8_t vi0 = pb[l/2] & 0xF;
10744+
const uint8_t vi1 = pb[l/2] >> 4;
1075110745

10752-
memcpy(pb, pp, pp_size);
10753-
pb += bs;
10746+
hist[vi0]++;
10747+
hist[vi1]++;
1075410748
}
10749+
pb += bs;
1075510750
}
1075610751
}
1075710752

1075810753
return (n/k)*row_size;
1075910754
}
1076010755

10761-
size_t ggml_quantize_q4_1(float * src, void * dst, int n, int k, int qk, int64_t * hist) {
10756+
size_t ggml_quantize_q4_1(const float * src, void * dst, int n, int k, int qk, int64_t * hist) {
1076210757
const int nb = k / qk;
1076310758
const size_t bs = (2*sizeof(float) + sizeof(uint8_t)*qk/2);
1076410759
const size_t row_size = nb*bs;
1076510760

1076610761
assert(k % qk == 0);
1076710762

10768-
const size_t pp_size = qk / 2;
10769-
uint8_t * pp = (uint8_t *) alloca(pp_size);
10770-
1077110763
char * pdst = (char *) dst;
1077210764

1077310765
for (int j = 0; j < n; j += k) {
1077410766
uint8_t * pd = (uint8_t *) (pdst + (j/k)*row_size + 0*bs);
10775-
uint8_t * pm = (uint8_t *) (pdst + (j/k)*row_size + 0*bs + sizeof(float));
1077610767
uint8_t * pb = (uint8_t *) (pdst + (j/k)*row_size + 0*bs + 2*sizeof(float));
1077710768

10778-
//printf("n = %d, k = %d, nb = %d, row_size = %d, j = %d, pm = %p, pd = %p, pb = %p\n", n, k, nb, row_size, j, pm, pd, pb);
10769+
quantize_row_q4_1(src + j, pd, k);
1077910770

1078010771
for (int i = 0; i < nb; i++) {
10781-
float min = FLT_MAX;
10782-
float max = -FLT_MAX;
10783-
10784-
{
10785-
for (int l = 0; l < qk; l++) {
10786-
const float v = src[j + i*qk + l];
10787-
if (v < min) min = v;
10788-
if (v > max) max = v;
10789-
}
10790-
10791-
const float d = (max - min) / ((1 << 4) - 1);
10792-
const float id = d ? 1.0f/d : 0.0f;
10793-
10794-
*(float *) pd = d;
10795-
*(float *) pm = min;
10796-
pd += bs;
10797-
pm += bs;
10798-
10799-
for (int l = 0; l < qk; l += 2) {
10800-
const float v0 = (src[j + i*qk + l + 0] - min)*id;
10801-
const float v1 = (src[j + i*qk + l + 1] - min)*id;
10802-
10803-
const uint8_t vi0 = round(v0);
10804-
const uint8_t vi1 = round(v1);
10805-
10806-
assert(vi0 >= 0 && vi0 < 16);
10807-
assert(vi1 >= 0 && vi1 < 16);
10808-
10809-
hist[vi0]++;
10810-
hist[vi1]++;
10811-
10812-
pp[l/2] = vi0 | (vi1 << 4);
10813-
}
10772+
for (int l = 0; l < qk; l += 2) {
10773+
const uint8_t vi0 = pb[l/2] & 0xF;
10774+
const uint8_t vi1 = pb[l/2] >> 4;
1081410775

10815-
memcpy(pb, pp, pp_size);
10816-
pb += bs;
10776+
hist[vi0]++;
10777+
hist[vi1]++;
1081710778
}
10779+
pb += bs;
1081810780
}
1081910781
}
1082010782

ggml.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -745,8 +745,8 @@ enum ggml_opt_result ggml_opt(
745745
// quantization
746746
//
747747

748-
size_t ggml_quantize_q4_0(float * src, void * dst, int n, int k, int qk, int64_t * hist);
749-
size_t ggml_quantize_q4_1(float * src, void * dst, int n, int k, int qk, int64_t * hist);
748+
size_t ggml_quantize_q4_0(const float * src, void * dst, int n, int k, int qk, int64_t * hist);
749+
size_t ggml_quantize_q4_1(const float * src, void * dst, int n, int k, int qk, int64_t * hist);
750750

751751
//
752752
// system info

llamacpp.dll

-2 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)