-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Open
Labels
bugSomething isn't workingSomething isn't workingpendingThis problem is yet to be addressedThis problem is yet to be addressed
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
0.9.4, Windows, Python 3.12.10
Reproduction
After training Lora based on a ChatML template with unsloth (e.g., Magnum v2 4B or Hermes 8B), I am unable to load the model for test chat with unsloth.
[WARNING|2026-03-04 21:16:43] llamafactory.extras.ploting:149 >> No metric eval_loss to plot.
[WARNING|2026-03-04 21:16:43] llamafactory.extras.ploting:149 >> No metric eval_accuracy to plot.
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:00,548 >> loading file tokenizer.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\tokenizer.json
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:00,549 >> loading file tokenizer.model from cache at None
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:00,549 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:00,549 >> loading file special_tokens_map.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\special_tokens_map.json
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:00,549 >> loading file tokenizer_config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\tokenizer_config.json
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:00,549 >> loading file chat_template.jinja from cache at None
[INFO|tokenization_utils_base.py:2380] 2026-03-04 21:17:00,753 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:765] 2026-03-04 21:17:01,884 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\config.json
[INFO|configuration_utils.py:839] 2026-03-04 21:17:01,884 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128019,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 3072,
"initializer_range": 0.02,
"intermediate_size": 9216,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:02,334 >> loading file tokenizer.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\tokenizer.json
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:02,334 >> loading file tokenizer.model from cache at None
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:02,334 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:02,334 >> loading file special_tokens_map.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\special_tokens_map.json
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:02,334 >> loading file tokenizer_config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\tokenizer_config.json
[INFO|tokenization_utils_base.py:2111] 2026-03-04 21:17:02,334 >> loading file chat_template.jinja from cache at None
[INFO|tokenization_utils_base.py:2380] 2026-03-04 21:17:02,518 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2026-03-04 21:17:02] llamafactory.data.template:144 >> Add <|im_start|> to stop words.
[INFO|configuration_utils.py:765] 2026-03-04 21:17:02,851 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\config.json
[INFO|configuration_utils.py:839] 2026-03-04 21:17:02,851 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128019,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 3072,
"initializer_range": 0.02,
"intermediate_size": 9216,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"vocab_size": 128256
}
[WARNING|logging.py:328] 2026-03-04 21:17:02,851 >> `torch_dtype` is deprecated! Use `dtype` instead!
[INFO|2026-03-04 21:17:02] llamafactory.model.model_utils.kv_cache:144 >> KV cache is enabled for faster generation.
E:\LlamaFactory\src\llamafactory\model\model_utils\unsloth.py:89: UserWarning: WARNING: Unsloth should be imported before [trl, transformers, peft] to ensure all optimizations are applied. Your code may run slower or encounter memory issues without these optimizations.
Please restructure your imports with 'import unsloth' at the top of your file.
from unsloth import FastLanguageModel # type: ignore
π¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Your Flash Attention 2 installation seems to be broken. Using Xformers instead. No performance changes will be seen.
π¦₯ Unsloth Zoo will now patch everything to make training faster!
[INFO|configuration_utils.py:765] 2026-03-04 21:17:06,654 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\config.json
[INFO|configuration_utils.py:839] 2026-03-04 21:17:06,670 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128019,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 3072,
"initializer_range": 0.02,
"intermediate_size": 9216,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"vocab_size": 128256
}
Unsloth: WARNING `trust_remote_code` is True.
Are you certain you want to do remote code execution?
==((====))== Unsloth 2026.3.3: Fast Llama patching. Transformers: 4.57.6.
\\ /| NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 24.0 GB. Platform: Windows.
O^O/ \_/ \ Torch: 2.10.0+cu130. CUDA: 8.6. CUDA Toolkit: 13.0. Triton: 3.6.0
\ / Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
[INFO|configuration_utils.py:765] 2026-03-04 21:17:12,598 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\config.json
[INFO|configuration_utils.py:839] 2026-03-04 21:17:12,598 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128019,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 3072,
"initializer_range": 0.02,
"intermediate_size": 9216,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"vocab_size": 128256
}
[INFO|configuration_utils.py:765] 2026-03-04 21:17:12,809 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\config.json
[INFO|configuration_utils.py:839] 2026-03-04 21:17:12,809 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128019,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 3072,
"initializer_range": 0.02,
"intermediate_size": 9216,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": false,
"vocab_size": 128256
}
[INFO|modeling_utils.py:1172] 2026-03-04 21:17:12,809 >> loading weights file model.safetensors from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\model.safetensors.index.json
[INFO|modeling_utils.py:2341] 2026-03-04 21:17:12,809 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:986] 2026-03-04 21:17:12,809 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128019,
"use_cache": false
}
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:02<00:00, 1.35s/it]
[INFO|configuration_utils.py:941] 2026-03-04 21:17:15,728 >> loading configuration file generation_config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--anthracite-org--magnum-v2-4b\snapshots\31a45f774c4db8005f645c7fbd1345ad47b45ceb\generation_config.json
[INFO|configuration_utils.py:986] 2026-03-04 21:17:15,728 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128001
}
[INFO|dynamic_module_utils.py:423] 2026-03-04 21:17:15,878 >> Could not locate the custom_generate/generate.py inside anthracite-org/magnum-v2-4b.
Traceback (most recent call last):
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\queueing.py", line 849, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\route_utils.py", line 354, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\blocks.py", line 2191, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\blocks.py", line 1710, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 760, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 751, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\anyio\to_thread.py", line 63, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 2502, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 986, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 734, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 898, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\webui\chatter.py", line 158, in load_model
super().__init__(args)
File "E:\LlamaFactory\src\llamafactory\chat\chat_model.py", line 53, in __init__
self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\chat\hf_engine.py", line 59, in __init__
self.model = load_model(
^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\loader.py", line 189, in load_model
model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\adapter.py", line 360, in init_adapter
model = _setup_lora_tuning(
^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\adapter.py", line 208, in _setup_lora_tuning
model = load_unsloth_peft_model(config, model_args, finetuning_args, is_trainable=is_trainable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\model_utils\unsloth.py", line 96, in load_unsloth_peft_model
model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\models\loader.py", line 704, in from_pretrained
model, tokenizer = dispatch_model.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\models\llama.py", line 2501, in from_pretrained
tokenizer = load_correct_tokenizer(
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\tokenizer_utils.py", line 622, in load_correct_tokenizer
chat_template = fix_chat_template(tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\tokenizer_utils.py", line 734, in fix_chat_template
raise RuntimeError(
RuntimeError: Unsloth: The tokenizer `saves\Llama-3.1-8B-Instruct\lora\train_2026-03-04-20-39-27-Magnum-4B-1`
does not have a {% if add_generation_prompt %} for generation purposes.
Please file a bug report to the maintainers of `saves\Llama-3.1-8B-Instruct\lora\train_2026-03-04-20-39-27-Magnum-4B-1` - thanks!
I also reported this bug to unsloth here, along with several logs:
unslothai/unsloth#4150
Others
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingpendingThis problem is yet to be addressedThis problem is yet to be addressed