fix: Explicitly enable add_bos_token during conversion

gabe-l-hart · gabe-l-hart · commit 3191a8d1af1c · 2025-08-25T10:20:30.000-06:00
The `tokenizer.json`/`tokenizer_config.json` in the model are a bit contradictory. In the config, add_bos_token is set to False, but the tokenizer model itself has a post_processor that adds the BOS token via type: TemplateProcessing https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409 Branch: gabe-l-hart/nvidia-nemotron-nano-15409 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
@@ -7602,6 +7602,14 @@ def set_gguf_parameters(self):
             n_ff if i in self._mlp_layers else 0 for i in range(self.block_count)
         ])
 
+    def set_vocab(self):
+        super().set_vocab()
+
+        # The tokenizer _does_ add a BOS token (via post_processor type
+        # TemplateProcessing) but does not set add_bos_token to true in the
+        # config, so we need to explicitly override it here.
+        self.gguf_writer.add_add_bos_token(True)
+
 
 @ModelBase.register("BailingMoeForCausalLM")
 class BailingMoeModel(TextModel):