Skip to content

Commit 3191a8d

Browse files
committed
fix: Explicitly enable add_bos_token during conversion
The `tokenizer.json`/`tokenizer_config.json` in the model are a bit contradictory. In the config, add_bos_token is set to False, but the tokenizer model itself has a post_processor that adds the BOS token via type: TemplateProcessing https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409 Branch: gabe-l-hart/nvidia-nemotron-nano-15409 Signed-off-by: Gabe Goodhart <[email protected]>
1 parent 828176e commit 3191a8d

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

convert_hf_to_gguf.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7602,6 +7602,14 @@ def set_gguf_parameters(self):
76027602
n_ff if i in self._mlp_layers else 0 for i in range(self.block_count)
76037603
])
76047604

7605+
def set_vocab(self):
7606+
super().set_vocab()
7607+
7608+
# The tokenizer _does_ add a BOS token (via post_processor type
7609+
# TemplateProcessing) but does not set add_bos_token to true in the
7610+
# config, so we need to explicitly override it here.
7611+
self.gguf_writer.add_add_bos_token(True)
7612+
76057613

76067614
@ModelBase.register("BailingMoeForCausalLM")
76077615
class BailingMoeModel(TextModel):

0 commit comments

Comments
 (0)