Commit f0518b3
authored
gguf: add CLI (#1221)
Ref discussion:
https://huggingface.slack.com/archives/C02CLHA19TL/p1740399079674399?thread_ts=1739968558.574099&cid=C02CLHA19TL
I'm trying with this command:
```bash
pnpm run build && npx . ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
```
Output:
```
* Dumping 36 key/value pair(s)
Idx | Count | Value
----|--------|----------------------------------------------------------------------------------
1 | 1 | version = 3
2 | 1 | tensor_count = 292
3 | 1 | kv_count = 33
4 | 1 | general.architecture = "llama"
5 | 1 | general.type = "model"
6 | 1 | general.name = "Meta Llama 3.1 8B Instruct"
7 | 1 | general.finetune = "Instruct"
8 | 1 | general.basename = "Meta-Llama-3.1"
9 | 1 | general.size_label = "8B"
10 | 1 | general.license = "llama3.1"
11 | 6 | general.tags = ["facebook","meta","pytorch","llama","llama-3","te...
12 | 8 | general.languages = ["en","de","fr","it","pt","hi","es","th"]
13 | 1 | llama.block_count = 32
14 | 1 | llama.context_length = 131072
15 | 1 | llama.embedding_length = 4096
16 | 1 | llama.feed_forward_length = 14336
17 | 1 | llama.attention.head_count = 32
18 | 1 | llama.attention.head_count_kv = 8
19 | 1 | llama.rope.freq_base = 500000
20 | 1 | llama.attention.layer_norm_rms_epsilon = 0.000009999999747378752
21 | 1 | general.file_type = 15
22 | 1 | llama.vocab_size = 128256
23 | 1 | llama.rope.dimension_count = 128
24 | 1 | tokenizer.ggml.model = "gpt2"
25 | 1 | tokenizer.ggml.pre = "llama-bpe"
26 | 128256 | tokenizer.ggml.tokens = ["!","\"","#","$","%","&","'","(",")","*","+",",",...
27 | 128256 | tokenizer.ggml.token_type = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1...
28 | 280147 | tokenizer.ggml.merges = ["Ġ Ġ","Ġ ĠĠĠ","ĠĠ ĠĠ","ĠĠĠ Ġ","i n","Ġ t","Ġ ĠĠĠĠ...
29 | 1 | tokenizer.ggml.bos_token_id = 128000
30 | 1 | tokenizer.ggml.eos_token_id = 128009
31 | 1 | tokenizer.chat_template = "{{- bos_token }}\n{%- if custom_tools is defined ...
32 | 1 | general.quantization_version = 2
33 | 1 | quantize.imatrix.file = "/models_out/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-...
34 | 1 | quantize.imatrix.dataset = "/training_dir/calibration_datav3.txt"
35 | 1 | quantize.imatrix.entries_count = 224
36 | 1 | quantize.imatrix.chunks_count = 125
* Dumping 292 tensor(s)
Idx | Num Elements | Shape | Data Type | Name
----|--------------|--------------------------------|-----------|--------------------------
1 | 64 | 64, 1, 1, 1 | F32 | rope_freqs.weight
2 | 525336576 | 4096, 128256, 1, 1 | Q4_K | token_embd.weight
3 | 4096 | 4096, 1, 1, 1 | F32 | blk.0.attn_norm.weight
4 | 58720256 | 14336, 4096, 1, 1 | Q6_K | blk.0.ffn_down.weight
...(truncated)
```
---
For reference, here is the output of `gguf_dump.py`:
```
$ python gguf_dump.py ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
INFO:gguf-dump:* Loading: /Users/ngxson/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 36 key/value pair(s)
1: UINT32 | 1 | GGUF.version = 3
2: UINT64 | 1 | GGUF.tensor_count = 292
3: UINT64 | 1 | GGUF.kv_count = 33
4: STRING | 1 | general.architecture = 'llama'
5: STRING | 1 | general.type = 'model'
6: STRING | 1 | general.name = 'Meta Llama 3.1 8B Instruct'
7: STRING | 1 | general.finetune = 'Instruct'
8: STRING | 1 | general.basename = 'Meta-Llama-3.1'
9: STRING | 1 | general.size_label = '8B'
10: STRING | 1 | general.license = 'llama3.1'
11: [STRING] | 6 | general.tags
12: [STRING] | 8 | general.languages
13: UINT32 | 1 | llama.block_count = 32
14: UINT32 | 1 | llama.context_length = 131072
15: UINT32 | 1 | llama.embedding_length = 4096
16: UINT32 | 1 | llama.feed_forward_length = 14336
17: UINT32 | 1 | llama.attention.head_count = 32
18: UINT32 | 1 | llama.attention.head_count_kv = 8
19: FLOAT32 | 1 | llama.rope.freq_base = 500000.0
20: FLOAT32 | 1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
21: UINT32 | 1 | general.file_type = 15
22: UINT32 | 1 | llama.vocab_size = 128256
23: UINT32 | 1 | llama.rope.dimension_count = 128
24: STRING | 1 | tokenizer.ggml.model = 'gpt2'
25: STRING | 1 | tokenizer.ggml.pre = 'llama-bpe'
26: [STRING] | 128256 | tokenizer.ggml.tokens
27: [INT32] | 128256 | tokenizer.ggml.token_type
28: [STRING] | 280147 | tokenizer.ggml.merges
29: UINT32 | 1 | tokenizer.ggml.bos_token_id = 128000
30: UINT32 | 1 | tokenizer.ggml.eos_token_id = 128009
31: STRING | 1 | tokenizer.chat_template = '{{- bos_token }}\n{%- if custom_tools is defined %}\n {%- s'
32: UINT32 | 1 | general.quantization_version = 2
33: STRING | 1 | quantize.imatrix.file = '/models_out/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8'
34: STRING | 1 | quantize.imatrix.dataset = '/training_dir/calibration_datav3.txt'
35: INT32 | 1 | quantize.imatrix.entries_count = 224
36: INT32 | 1 | quantize.imatrix.chunks_count = 125
* Dumping 292 tensor(s)
1: 64 | 64, 1, 1, 1 | F32 | rope_freqs.weight
2: 525336576 | 4096, 128256, 1, 1 | Q4_K | token_embd.weight
3: 4096 | 4096, 1, 1, 1 | F32 | blk.0.attn_norm.weight
4: 58720256 | 14336, 4096, 1, 1 | Q6_K | blk.0.ffn_down.weight
5: 58720256 | 4096, 14336, 1, 1 | Q4_K | blk.0.ffn_gate.weight
6: 58720256 | 4096, 14336, 1, 1 | Q4_K | blk.0.ffn_up.weight
7: 4096 | 4096, 1, 1, 1 | F32 | blk.0.ffn_norm.weight
8: 4194304 | 4096, 1024, 1, 1 | Q4_K | blk.0.attn_k.weight
```1 parent 5e4beab commit f0518b3
3 files changed
+153
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
99 | 145 | | |
100 | 146 | | |
101 | 147 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
| |||
18 | 21 | | |
19 | 22 | | |
20 | 23 | | |
| 24 | + | |
21 | 25 | | |
22 | 26 | | |
23 | 27 | | |
| |||
32 | 36 | | |
33 | 37 | | |
34 | 38 | | |
35 | | - | |
| 39 | + | |
36 | 40 | | |
37 | 41 | | |
38 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
0 commit comments