You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/gguf.md
+57-6Lines changed: 57 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,40 +20,91 @@ The key difference between GGJT and GGUF is the use of a key-value structure for
20
20
21
21
### GGUF Naming Convention
22
22
23
-
GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<EncodingScheme>.gguf`
23
+
GGUF follow a naming convention of `<Model>(-<Version>)-(<ExpertsCount>x)<Parameters>-<EncodingScheme>(-<Shard>).gguf`
24
24
25
25
The components are:
26
26
1.**Model**: A descriptive name for the model type or architecture.
27
+
- This can be derived from gguf metadata `general.name` substituting spaces for dashes.
27
28
2.**Version**: (Optional) Denotes the model version number, formatted as `v<Major>.<Minor>`
28
29
- If model is missing a version number then assume `v0.0` (Prerelease)
29
-
3.**ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model.
30
+
- This can be derived from gguf metadata `general.version`
31
+
3.**ExpertsCount**: (Optional) Indicates the number of experts found in a Mixture of Experts based model.
32
+
- This can be derived from gguf metadata `llama.expert_count`
30
33
4.**Parameters**: Indicates the number of parameters and their scale, represented as `<count><scale-prefix>`:
31
34
-`Q`: Quadrillion parameters.
32
35
-`T`: Trillion parameters.
33
36
-`B`: Billion parameters.
34
37
-`M`: Million parameters.
35
38
-`K`: Thousand parameters.
36
39
5.**EncodingScheme**: Indicates the weights encoding scheme that was applied to the model. Content, type mixture and arrangement however are determined by user code and can vary depending on project needs.
40
+
6.**Shard**: (Optional) Indicates and denotes that the model has been split into multiple shards, formatted as `<ShardNum>-of-<ShardTotal>`.
41
+
-*ShardNum* : Shard position in this model. Must be 5 digits padded by zeros.
42
+
- Shard number always starts from `00001` onwards (e.g. First shard always starts at `00001-of-XXXXX` rather than `00000-of-XXXXX`).
43
+
-*ShardTotal* : Total number of shards in this model. Must be 5 digits padded by zeros.
37
44
38
45
#### Parsing Above Naming Convention
39
46
40
47
To correctly parse a well formed naming convention based gguf filename, it is recommended to read from right to left using `-` as the delimiter. This strategy allow for the most flexibility in model name to include dashes if they so choose, while at the same time allowing for version string to be optional. This approach also gives some future proofing to extend the format if needed in the future.
41
48
42
49
For example:
43
50
44
-
*`mixtral-v0.1-8x7B-KQ2.gguf`:
51
+
*`Mixtral-v0.1-8x7B-Q2_K.gguf`:
45
52
- Model Name: Mixtral
46
53
- Version Number: v0.1
47
54
- Expert Count: 8
48
55
- Parameter Count: 7B
49
-
- Weight Encoding Scheme: KQ2
56
+
- Weight Encoding Scheme: Q2_K
57
+
- Shard: N/A
50
58
51
59
*`Hermes-2-Pro-Llama-3-8B-F16.gguf`:
52
60
- Model Name: Hermes 2 Pro Llama 3
53
-
- Version Number: v0.0 (`<Version>-` missing)
54
-
- Expert Count: 0 (`<ExpertsCount>x` missing)
61
+
- Version Number: v0.0
62
+
- Expert Count: 0
55
63
- Parameter Count: 8B
56
64
- Weight Encoding Scheme: F16
65
+
- Shard: N/A
66
+
67
+
*`Grok-v1.0-100B-Q4_0-00003-of-00009.gguf"`
68
+
- Model Name: Grok
69
+
- Version Number: v1.0
70
+
- Expert Count: 0
71
+
- Parameter Count: 100B
72
+
- Weight Encoding Scheme: Q4_0
73
+
- Shard: 3 out of 9 total shards
74
+
75
+
You can also try using `/^(?<model_name>[A-Za-z0-9\s-]+)(?:-v(?<major>\d+)\.(?<minor>\d+))?-(?:(?<experts_count>\d+)x)?(?<parameters>\d+[A-Za-z]?)-(?<encoding_scheme>[\w_]+)(?:-(?<shard>\d{5})-of-(?<shardTotal>\d{5}))?\.gguf$/` regular expression to extract all the values above as well. Just don't forget to convert `-` to `` for the model name.
0 commit comments